Saturday, March 22, 2014

The mystery of the missing WhiteSpace in XML Attribute values

My expectation when testing the component that handles the import & export of xml data for the system, was that the input & output xml data should be the same if the component works properly. However, I noticed minor whitespace differences in attribute values between the two xml files and I couldn't find any code that was trimming attribute values during either the import nor the export of xml. A closer look showed that the spaces in attribute values were being trimmed and multiple occurrences of space replaced by a single space when the attribute values were being read in!

This was very intriguing so digging around in the XML Specs, says that in the canonical form of an XML document, attribute values are normalized by the XML processor. The Attribute Value Normalization section further lists out the exact algorithm to be used by the XML processor where-in all occurrences of whitespace are replaced by a space. Furthermore, if the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by trimming leading and trailing space and by replacing sequences of spaces by a single space. Also, there is a separate section to handle new line characters which states that all line-breaks or occurrences of CR & LF must be replaced by a LF character.


Related Posts Plugin for WordPress, Blogger...