XML

Modified

Resources

Overview

The eXtensible Markup Language (XML) is a language for defining languages, we will use it to define a hierarchical data structure. A large number of languages used for data interchange are defined in XML, examples include SOAP, RSS, and XHTML. Much of the data and information of the Web 2.0 is encoded using XML; Amazon, Flickr, newspapers and many others provide access to their holdings as XML documents.

Many programming languages and applications such as the IE Web browser can parse XML to build a hierarchical tree (DOM) that corresponds to the XML. A parser reads the XML and constructs the corresponding tree. A program such as IE would use a parser to first construct the tree, then reference nodes on the tree to access each element of the XML.

One downside to XML on mobile devices is verbosity; designed to be general-purpose, it's use always requires more resources than a tailored solution. Because most of the XML use is Web-based with the mobile device acting as a client, one can implement an intermediary server that passes the mobile device requests to a Web-host that returns XML to the intermediary, which parses, extracts and forwards to the mobile device only the appropriate data. While reducing the operational impact on resources, the architecture is at least somewhat more complicated to implement.

XML (eXtensible Markup Language) 

A language for defining and representing languages. 

One common XML use is to supply structured data to an application. Information for shipping a package from one Zip code to another can be specified in XML with a hierarchy tree as:
 

<!-- UPS Package -->
<package>
  <To Signature="Yes">47150</To>
  <From>47165</From>
  <Weight>17.0</Weight>
  <Rate>27.50</Rate>
</package>

XML Objects

As the figure above indicates, XML can be viewed as a tree consisting of the following objects:

The element <package> has four children: child(0) is element <To>, etc.

 

Basic XML Rules

 

Document Type Declaration (DTD)

 

DTD Grammar Definition

Examples

<!ELEMENT AorBB (A | B+)>

defines a rule where:

<AorBB> <A/> </AorBB>
<AorBB> <B/> </AorBB>
<AorBB> <B/> <B/> <B/> <B/> </AorBB>

are valid but:

<AorBB> <A/> <B/> </AorBB>
<AorBB> <A/> <A/> <A/> </AorBB>

are invalid.

 <!ELEMENT AandBB (A , B+)>

defines a rule where:

<AandBB> <A/> <B/> </AandBB>
<AandBB> <A/> <B/> <B/> <B/ </AandBB>

are valid but:

<AandBB> <A/> </AandBB>
<AandBB> <B/> </AandBB>
<AandBB> <B/> <A/> </AandBB>
<AandBB> <B/> <B/> </AandBB>

are invalid.

Parsing

A well-formed XML document's elements can be parsed into a tree structure that lends itself to programmatic operation, traversal, insertion, pruning, etc.

Note that a XML document need only be well-formed to be parsed.

 

Example

The following is a complete example that parses and displays the resulting tree. To test:

Copy the following three files to the same directory.

Open the Exp.htm file in a IE browser, it requires the parser defined in Microsoft.XMLDOM.

There are three files required:

  1. Exp.dtd the grammar definition
    <!ELEMENT exp ((exp, plus, exp) | (exp, times, exp) | ( lparen, exp, rparen) | a | b | c )>
    <!ELEMENT a EMPTY>
    <!ELEMENT b EMPTY>
    <!ELEMENT c EMPTY>
    <!ELEMENT plus EMPTY>
    <!ELEMENT times EMPTY>
    <!ELEMENT lparen EMPTY>
    <!ELEMENT rparen EMPTY>

     

  2. Exp.xml the document containing an example based on the grammar, (a+b) is defined.
    <!DOCTYPE exp SYSTEM "Exp.dtd">

    <exp>
      <lparen/>
        <exp>
           <exp><a/></exp>
           <plus/>
           <exp><b/></exp>
        </exp>
      <rparen/>
    </exp>

     

  3. Exp.htm JavaScript that parses, traverses and prints the resulting parse tree.
    <SCRIPT LANGUAGE="JavaScript"> 
      var xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); 
      try { xmlDoc.load("exp.xml"); 
        document.write("<pre>");
        traverse(xmlDoc.documentElement,"");
        document.write("</pre>");
      }
      catch(e) { document.write(
        "URL "+xmlDoc.parseError.url+" Line "+xmlDoc.parseError.line+
        " position "+xmlDoc.parseError.linepos+" "+
        xmlDoc.parseError.srcText + " " + xmlDoc.parseError.reason);
      } 
    
      function traverse( node, indent ) {
        var i, children, type = node.nodeTypeString;
        if (type == "element") { 
           document.write("<br>" + indent + node.nodeName); 
           children = node.childNodes; 
           if (children != null) 
               for (i=0; i<children.length; i++) 
                  traverse (children.item(i), indent + "| "); 
        }
      } 
    </SCRIPT>