When you download XML text from the Web, you may find “garbage characters” in the start of your XML string.  For example, I encountered this result when I downloaded an XML string using WebClient.DownloadString method:

<Root><Item>Hello, World</Item></Root>

What you are likely seeing is a Byte Order Mark (BOM), which is a Unicode character that indicates the endian-ness (byte order) of a text file or stream.  The BOM is optional and will appear at the start of the text stream, if at all.  The BOM may also indicate in which of the several Unicode representations the text is encoded.

The most common BOMs you may see are:

 = EF BB BF in hex = UTF-8

þÿ = ASCII code 65279 (Zero Width No-Break Space) = FE FF in hex = UTF-16 (Big Endian)

ÿþ = FF FE in hex = UTF-16 (Little Endian)

□□þÿ = 00 00 FE FF in hex = UTF-32 (Big Endian)

ÿþ□□ = FF FE 00 00 in hex = UTF-32 (Little Endian)

If you try to parse an XML string with a BOM using an XmlTextReader, for example, you will see an error message such as:

Data at the root level is invalid. Line 1, position 1.

Here is some simple code to strip the BOM from an XML string:

int index = xml.IndexOf( '<' );
if (index > 0)
    xml = xml.Substring( index, xml.Length - index );

Share and Enjoy:
  • Digg
  • Twitter
  • Facebook
  • Reddit
  • StumbleUpon
  • LinkedIn
  • Google Bookmarks
  • Slashdot