Consulting @ Gregory Lane

Convert Word Special Characters to Plain Text

by Gary Randolph

Clients often send me documents in Microsoft Word, and I have to convert them into HTML. The Word curly quotes and other special characters are HTML-incompatible, causing me a lot of work in manually scanning and retyping. So I wrote this utility to do the conversions for me.

If you want to read about the C# ASP.NET coding behind the program, scroll on down. Otherwise, feel free to use it yourself.

Paste Word text with special characters here:
Plain text appears here:

How It Works

This C# ASP.NET program is extremely simple. First, the input text is converted into a char array. Then the code loops through the array. All Microsoft Word special characters are caught with the switch statement and their plain text equivalents are added to the output string. Non-special characters are added to the output string as is. Finally, the output string is written to the output textbox.

private void btnConvert_Click(object sender, System.EventArgs e)
{
  char [] charList = txtWord.Text.ToCharArray();
  char quote=(char)34; //quote because you can't do """
  string plainText="";

  for (int counter=0; counter<charList.Length; counter
  {
    int thisChar = Convert.ToInt32(charList[counter]);
    switch(thisChar)
    {
      case 8217: //curly apostrophe
        plainText += "'";
        break;
      case 8230: //elipsis
        plainText += "...";
        break;
      case 8220: //left curly quote
        plainText += quote.ToString();
        break;
      case 8221: //right curly quote
        plainText += quote.ToString();
        break;
      default:
        plainText += charList[counter].ToString();
        break;
    }
  }
  txtConvert.Text = plainText;
}

About the only other thing work mentioning is how I discovered which characters were Word special characters. During development I added an if test to the loop that caught all characters with a code over 255 and captured them in a listbox (see the code below). Then I could just read them off the screen.

if (thisChar>255)
{
  ListItem newItem = new ListItem();
  newItem.Text = thisChar.ToString() + " - " + charList[counter].ToString();
  newItem.Value = thisChar.ToString();
  lbChars.Items.Add(newItem);
}

Copyright (c) 2007

Gary Randolph
2304 Gregory Lane
Anderson, IN 46012
Phone: 765.683.0309
E-mail: gr@gregorylane.com
Contact Us