XML |
Modified: |
Resources
Overview
The eXtensible Markup Language (XML) is a language for defining languages, we will use it to define a hierarchical data structure. A large number of languages used for data interchange are defined in XML, examples include SOAP, RSS, and XHTML. Much of the data and information of the Web 2.0 is encoded using XML; Amazon, Flickr, newspapers and many others provide access to their holdings as XML documents.
Many programming languages and applications such as the IE Web browser can parse XML to build a hierarchical tree (DOM) that corresponds to the XML. A parser reads the XML and constructs the corresponding tree. A program such as IE would use a parser to first construct the tree, then reference nodes on the tree to access each element of the XML.
One downside to XML on mobile devices is verbosity; designed to be general-purpose, it's use always requires more resources than a tailored solution. Because most of the XML use is Web-based with the mobile device acting as a client, one can implement an intermediary server that passes the mobile device requests to a Web-host that returns XML to the intermediary, which parses, extracts and forwards to the mobile device only the appropriate data. While reducing the operational impact on resources, the architecture is at least somewhat more complicated to implement.
XML (eXtensible Markup Language)
A language for defining and representing languages.
- HTML (and XHTML) is a subset of XML.
- One key use is to define a data structure and data within the structure, useful for exchanging data between different applications and computer systems.
- XML gives semantic meaning to data. For example, <Zip>47150</Zip> can mean the Zip code is 47150.
- Since XML is written as text that is readable and writeable by humans, it is also computer architecture neutral, it does not depend upon a specific bit-size representation, a floating point number is represented as characters 3.14, for example. XML can and is commonly used to communicate data over the Internet without regard to the sending or receiving machine characteristics.
- Many database systems such as Oracle can generate entries from a database in XML form for use by other applications that understand XML without regard to the computer system word size, etc.
- XML combined with XSL can separate implementation of the user interface from the data being presented.
One common XML use is to supply structured data to an application. Information for shipping a package from one Zip code to another can be specified in XML with a hierarchy tree as:
<!-- UPS Package --> <package> <To Signature="Yes">47150</To> <From>47165</From> <Weight>17.0</Weight> <Rate>27.50</Rate> </package>XML Objects
As the figure above indicates, XML can be viewed as a tree consisting of the following objects:
- Element: <package>
- Attribute: Signature="Yes"
- Text: 47165
- Comment: <!-- UPS Package -->
The element <package> has four children: child(0) is element <To>, etc.
Basic XML Rules
- Hierarchical element structure - XML documents must have a strictly hierarchical tag structure. That is, start tags must have exact corresponding end tags. In XML vocabulary, a pair of start and end tags is called an element. Any element must be completely nested within another.
- Well-formed - <To>47150</To> is well-formed because there is a <To> start tag and a </To> end tag.
- Case sensitive - XML is case sensitive. <to>47150</To> is invalid.
- Empty - An empty element can be written as <To></To> or equivalently as <To/>.
- Text - A non-empty element can enclose other elements or text. <To>47150</To> encloses text of 47150.
- Attribute - <To Signature="Yes">47150</To> has one attribute, Signature, with the value "Yes".
Document Type Declaration (DTD)
DTD Grammar Definition
Examples
<!ELEMENT AorBB (A | B+)>
defines a rule where:
<AorBB> <A/> </AorBB>
<AorBB> <B/> </AorBB>
<AorBB> <B/> <B/> <B/> <B/> </AorBB>are valid but:
<AorBB> <A/> <B/> </AorBB>
<AorBB> <A/> <A/> <A/> </AorBB>are invalid.
<!ELEMENT AandBB (A , B+)>
defines a rule where:
<AandBB> <A/> <B/> </AandBB>
<AandBB> <A/> <B/> <B/> <B/ </AandBB>are valid but:
<AandBB> <A/> </AandBB>
<AandBB> <B/> </AandBB>
<AandBB> <B/> <A/> </AandBB>
<AandBB> <B/> <B/> </AandBB>are invalid.
Parsing
A well-formed XML document's elements can be parsed into a tree structure that lends itself to programmatic operation, traversal, insertion, pruning, etc.
Note that a XML document need only be well-formed to be parsed.
Example
The following is a complete example that parses and displays the resulting tree. To test:
Copy the following three files to the same directory.
Open the Exp.htm file in a IE browser, it requires the parser defined in Microsoft.XMLDOM.
There are three files required:
- Exp.dtd the grammar definition
<!ELEMENT exp ((exp, plus, exp) | (exp, times, exp) | ( lparen, exp, rparen) | a | b | c )>
<!ELEMENT a EMPTY>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
<!ELEMENT plus EMPTY>
<!ELEMENT times EMPTY>
<!ELEMENT lparen EMPTY>
<!ELEMENT rparen EMPTY>
- Exp.xml the document containing an example based on the grammar, (a+b) is defined.
<!DOCTYPE exp SYSTEM "Exp.dtd">
<exp>
<lparen/>
<exp>
<exp><a/></exp>
<plus/>
<exp><b/></exp>
</exp>
<rparen/>
</exp>
- Exp.htm JavaScript that parses, traverses and prints the resulting parse tree.
<SCRIPT LANGUAGE="JavaScript"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); try { xmlDoc.load("exp.xml"); document.write("<pre>"); traverse(xmlDoc.documentElement,""); document.write("</pre>"); } catch(e) { document.write( "URL "+xmlDoc.parseError.url+" Line "+xmlDoc.parseError.line+ " position "+xmlDoc.parseError.linepos+" "+ xmlDoc.parseError.srcText + " " + xmlDoc.parseError.reason); } function traverse( node, indent ) { var i, children, type = node.nodeTypeString; if (type == "element") { document.write("<br>" + indent + node.nodeName); children = node.childNodes; if (children != null) for (i=0; i<children.length; i++) traverse (children.item(i), indent + "| "); } } </SCRIPT>