My expectation when testing the component that handles the import & export of xml data for the system, was that the input & output xml data should be the same if the component works properly. However, I noticed minor whitespace differences in attribute values between the two xml files and I couldn't find any code that was trimming attribute values during either the import nor the export of xml. A closer look showed that the spaces in attribute values were being trimmed and multiple occurrences of space replaced by a single space when the attribute values were being read in!
This was very intriguing so digging around in the XML Specs, says that in the canonical form of an XML document, attribute values are normalized by the XML processor. The Attribute Value Normalization section further lists out the exact algorithm to be used by the XML processor where-in all occurrences of whitespace are replaced by a space. Furthermore, if the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by trimming leading and trailing space and by replacing sequences of spaces by a single space. Also, there is a separate section to handle new line characters which states that all line-breaks or occurrences of CR & LF must be replaced by a LF character.
1 comment:
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. thanks~ Anne from big data analytics
Post a Comment