Force XmlWriter or XmlTextWriter to use Encoding Other Than UTF-16


You may have noticed the first line of XML output generated by XmlWriter or XmlTextWriter shows that the encoding defaults to UTF-16:

<?xml version="1.0" encoding="utf-16"?>

This happens even if you explicitly set the Encoding property in the XmlWriterSettings to something different, such as UTF-8:

StringBuilder sb  = new StringBuilder(); 
XmlWriterSettings settings = new XmlWriterSettings (); 
settings.Encoding = System.Text.Encoding.UTF8; 
XmlWriter writer = XmlWriter.Create (sb, settings); 

The problem occurs because the StringWriter defaults to UTF-16.  (It’s not clear from the example above, but the XmlWriter class uses a StringWriter to output the XML to the specified StringBuilder.) 

Read the rest of this entry »

How  and 65279 and Other Byte Order Marks (BOM) Can Mess Up Your XML


When you download XML text from the Web, you may find “garbage characters” in the start of your XML string.  For example, I encountered this result when I downloaded an XML string using WebClient.DownloadString method:

<Root><Item>Hello, World</Item></Root>

What you are likely seeing is a Byte Order Mark (BOM), which is a Unicode character that indicates the endian-ness (byte order) of a text file or stream.  The BOM is optional and will appear at the start of the text stream, if at all.  The BOM may also indicate in which of the several Unicode representations the text is encoded.

Read the rest of this entry »