Previous Section  < Day Day Up >  Next Section

10.1. Working with XML

Being literate in one's spoken language is defined as having the basic ability to read and write that language. In XML, functional literacy embraces more than reading and writing XML data. In addition to the XML data document, there is an XML Schema document (.xsd) that is used to validate the content and structure of an XML document. If the XML data is to be displayed or transformed, one or more XML style sheets (.xsl) can be used to define the transformation. Thus, we can define our own form of XML literacy as the ability to do five things:

  1. Create an XML file.

  2. Read and query an XML file.

  3. Create an XML Schema document.

  4. Use an XML Schema document to validate XML data.

  5. Create and use an XML style sheet to transform XML data.

The purpose of this section is to introduce XML concepts and terminology, as well as some .NET techniques for performing the preceding tasks. Of the five tasks, all are covered in this section, with the exception of reading and querying XML data, which is presented in later sections.

Using XML Serialization to Create XML Data

As discussed in Chapter 4, "Working with Objects in C#," serialization is a convenient way to store objects so they can later be deserialized into the original objects. If the natural state of your data allows it to be represented as objects, or if your application already has it represented as objects, XML serialization often offers a good choice for converting it into an XML format. However, there are some restrictions to keep in mind when applying XML serialization to a class:

  • The class must contain a public default (parameterless) constructor.

  • Only a public property or field can be serialized.

  • A read-only property cannot be serialized.

  • To serialize the objects in a custom collection class, the class must derive from the System.Collections.CollectionBase class and include an indexer. The easiest way to serialize multiple objects is usually to place them in a strongly typed array.

An Example Using the XmlSerializer Class

Listing 10-1 shows the XML file that we're going to use for further examples in this section. It was created by serializing instances of the class shown in Listing 10-2.

Listing 10-1. Sample XML File

<?xml version="1.0" standalone="yes"?>

   <films>

      <movies>

         <movie_ID>5</movie_ID>

         <movie_Title>Citizen Kane </movie_Title>

         <movie_Year>1941</movie_Year>

         <movie_DirectorID>Orson Welles</movie_DirectorID>

         <bestPicture>Y</bestPicture>

         <AFIRank>1</AFIRank>

      </movies>

      <movies>

         <movie_ID>6</movie_ID>

         <movie_Title>Casablanca </movie_Title>

         <movie_Year>1942</movie_Year>

         <movie_Director>Michael Curtiz</movie_Director>

         <bestPicture>Y</bestPicture>

         <AFIRank>1</AFIRank>

      </movies>

   </films>


In comparing Listings 10-1 and 10-2, it should be obvious that the XML elements are a direct rendering of the public properties defined for the movies class. The only exceptional feature in the code is the XmlElement attribute, which will be discussed shortly.

Listing 10-2. Using XmlSerializer to Create an XML File

using System.Xml;

using System.Xml.Serialization;

// other code here ...

public class movies

{

   public movies()  // Parameterless constructor is required

   {   }

   public movies(int ID, string title, string dir,string pic,

                 int yr, int movierank)

   {

      movieID = ID;

      movie_Director = dir;

      bestPicture = pic;

      rank = movierank;

      movie_Title = title;

      movie_Year = yr;

   }

   // Public properties that are serialized

   public int movieID

   {

      get { return mID; }

      set { mID = value; }

   }

   public string movie_Title

   {

      get { return mTitle; }

      set { mTitle = value; }

   }

   public int movie_Year

   {

      get { return mYear; }

      set { mYear = value; }

   }

   public string movie_Director

   {

      get { return mDirector; }

      set { mDirector = value; }

   }

   public string bestPicture

   {

      get { return mbestPicture; }

      set { mbestPicture = value; }

   }

   [XmlElement("AFIRank")]

   public int rank

   {

      get { return mAFIRank; }

      set { mAFIRank = value; }

   }

   private int mID;

   private string mTitle;

   private int mYear;

   private string mDirector;

   private string mbestPicture;

   private int mAFIRank;

}


To transform the class in Listing 10-2 to the XML in Listing 10-1, we follow the three steps shown in the code that follows. First, the objects to be serialized are created and stored in an array. Second, an XmlSerializer object is created. Its constructor (one of many constructor overloads) takes the object type it is serializing as the first parameter and an attribute as the second. The attribute enables us to assign "films" as the name of the root element in the XML output. The final step is to execute the XmlSerializer.Serialize method to send the serialized output to a selected stream梐 file in this case.


// (1) Create array of objects to be serialized

movies[] films = {new movies(5,"Citizen Kane","Orson Welles",

                             "Y", 1941,1 ),

                  new movies(6,"Casablanca","Michael Curtiz",

                             "Y", 1942,2)};

// (2) Create serializer

//     This attribute is used to assign name to XML root element 

XmlRootAttribute xRoot = new XmlRootAttribute();

   xRoot.ElementName = "films";

   xRoot.Namespace = "http://www.corecsharp.net";

   xRoot.IsNullable = true;

// Specify that an array of movies types is to be serialized

XmlSerializer xSerial = new XmlSerializer(typeof(movies[]),

                                          xRoot); 

string filename=@"c:\oscarwinners.xml";

// (3) Stream to write XML into

TextWriter writer = new StreamWriter(filename);

xSerial.Serialize(writer,films);


Serialization Attributes

By default, the elements created from a class take the name of the property they represent. For example, the movie_Title property is serialized as a <movie_Title> element. However, there is a set of serialization attributes that can be used to override the default serialization results. Listing 10-2 includes an XmlElement attribute whose purpose is to assign a name to the XML element that is different than that of the corresponding property or field. In this case, the rank property name is replaced with AFIRank in the XML.

There are more than a dozen serialization attributes. Here are some other commonly used ones:

XmlAttribute

Is attached to a property or field and causes it to be rendered as an attribute within an element.


Example: XmlAttribute("movieID")]
Result: <movies movie>

XmlIgnore

Causes the field or property to be excluded from the XML.

XmlText

Causes the value of the field or property to be rendered as text. No elements are created for the member name.


Example: [XmlText]
         public string movie_Title{
Result: <movies movie>Citizen Kane


XML Schema Definition (XSD)

The XML Schema Definition document is an XML file that is used to validate the contents of another XML document. The schema is essentially a template that defines in detail what is permitted in an associated XML document. Its role is similar to that of the BNF (Backus-Naur Form) notation that defines a language's syntax for a compiler.

.NET provides several ways (others are included in Chapter 11, "ADO.NET") to create a schema from an XML data document. One of the easiest ways is to use the XML Schema Definition tool (Xsd.exe). Simply run it from a command line and specify the XML file for which it is to produce a schema:


C:/ xsd.exe  oscarwinners.xml


The output, oscarwinners.xsd, is shown in Listing 10-3.

Listing 10-3. XML Schema to Apply Against XML in Listing 10-1

<xs:schema  xmlns="" 

        xmlns:xs=http://www.w3.org/2001/XMLSchema

        xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">

  <xs:element name="films" msdata:IsDataSet="true">

    <xs:complexType>

      <xs:choice minOccurs="0" maxOccurs="unbounded">

        <xs:element name="movies">

          <xs:complexType>

            <xs:sequence>

              <xs:element name="movie_ID" type="xs:int"

                     minOccurs="0" />

              <xs:element name="movie_Title" type="xs:string"

                     minOccurs="0" />

              <xs:element name="movie_Year" type="xs:int"

                     minOccurs="0" />

              <xs:element name="movie_Director" type="xs:string"

                     minOccurs="0" />

              <xs:element name="bestPicture" type="xs:string"

                     minOccurs="0" />

              <xs:element name="AFIRank" type="xs:int"

                     minOccurs="0" 

              />

            </xs:sequence>

          </xs:complexType>

        </xs:element>

      </xs:choice>

    </xs:complexType>

  </xs:element>

</xs:schema>


As should be evident from this small sample, the XML Schema language has a rather complex syntax. Those interested in all of its details can find them at the URL shown in the first line of the schema. For those with a more casual interest, the most important thing to note is that the heart of the document is a description of the valid types that may be contained in the XML data that the schema describes. In addition to the string and int types shown here, other supported types include boolean, double, float, dateTime, and hexBinary.

The types specified in the schema are designated as simple or complex. The complextype element defines any node that has children or an attribute; the simpletype has no attribute or child. You'll encounter many schemas where the simple types are defined at the beginning of the schema, and complex types are later defined as a combination of simple types.

XML Schema Validation

A schema is used by a validator to check an XML document for conformance to the layout and content defined by the schema. .NET implements validation as a read and check process. As a class iterates through each node in an XML tree, the node is validated. Listing 10-4 illustrates how the XmlValidatingReader class performs this operation.

First, an XmlTextReader is created to stream through the nodes in the data document. It is passed as an argument to the constructor for the XmlValidatingReader. Then, the ValidationType property is set to indicate a schema will be used for validation. This property can also be set to XDR or DTD to support older validation schemas.

The next step is to add the schema that will be used for validating to the reader's schema collection. Finally, the XmlValidatingReader is used to read the stream of XML nodes. Exception handling is used to display any validation error that occurs.

Listing 10-4. XML Schema Validation

private static bool ValidateSchema(string xml, string xsd)

{

   // Parameters: XML document and schemas

   // (1) Create a validating reader

   XmlTextReader tr = new XmlTextReader(xml");

   XmlValidatingReader xvr = new XmlValidatingReader(tr);

   // (2) Indicate schema validation 

   xvr.ValidationType= ValidationType.Schema;

   // (3) Add schema to be used for validation

   xvr.Schemas.Add(null, xsd);

   try

   {

      Console.WriteLine("Validating: ");

      // Loop through all elements in XML document

      while(xvr.Read())

      {

         Console.Write(".");

      }

   }catch (Exception ex)

   { Console.WriteLine( "\n{0}",ex.Message); return false;}

   return true;

}


Note that the XmlValidatingReader class implements the XmlReader class underneath. We'll demonstrate using XmlReader to perform validation in the next section. In fact, in most cases, XmlReader (.NET 2.0 implmentation) now makes XmlValidatingReader obsolete.

Using an XML Style Sheet

A style sheet is a document that describes how to transform raw XML data into a different format. The mechanism that performs the transformation is referred to as an XSLT (Extensible Style Language Transformation) processor. Figure 10-1 illustrates the process: The XSLT processor takes as input the XML document to be transformed and the XSL document that defines the transformation to be applied. This approach permits output to be generated dynamically in a variety of formats. These include XML, HTML or ASPX for a Web page, and a PDF document.

Figure 10-1. Publishing documents with XSLT


The XslTransform Class

The .NET version of the XSLT processor is the XslTransform class found in the System.Xml.Xsl namespace. To demonstrate its use, we'll transform our XML movie data into an HTML file for display by a browser (see Figure 10-2).

Figure 10-2. XML data is transformed into this HTML output


Before the XslTransform class can be applied, an XSLT style sheet that describes the transformation must be created. Listing 10-5 contains the style sheet that will be used. As you can see, it is a mixture of HTML markup, XSL elements, and XSL commands that displays rows of movie information with three columns. The XSL elements and functions are the key to the transformation. When the XSL style sheet is processed, the XSL elements are replaced with the data from the original XML document.

Listing 10-5. XML Style Sheet to Create HTML Output

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" 

    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">  

   <HTML>

     <TITLE>Movies</TITLE>

     <Table border="0" padding="0" cellspacing="1">

     <THEAD>  

       <TH>Movie Title</TH>

       <TH>Movie Year </TH>

       <TH>AFI Rank   </TH>

       <TH>Director   </TH>

     </THEAD>

     <xsl:for-each select="//movies">

         <xsl:sort select="movie_Title" />

       <tr>

         <td><xsl:value-of select="movie_Title"/> </td>

         <td align="center"><xsl:value-of select=

               "movie_Year"/></td>

         <td align="center"><xsl:value-of select=

               "AFIRank" /></td>

      <td><xsl:value-of select="movie_Director" /></td>

        </tr>

      </xsl:for-each>

      </Table>

    </HTML>

  </xsl:template>

</xsl:stylesheet>


Some points of interest:

  • The URL in the namespace of the <xsl:stylesheet> element must be exactly as shown here.

  • The match attribute is set to an XPath query that indicates which elements in the XML file are to be converted. Setting match="/" selects all elements.

  • The for-each construct loops through a group of selected nodes specified by an XPath expression following the select attribute. XPath is discussed in Section 10.4, "Using XPath to Search XML."

  • The value-of function extracts a selected value from the XML document and inserts it into the output.

  • The <xsl:sort> element is used to sort the incoming data and is used in conjunction with the for-each construct. Here is its syntax:


select = XPath expression

order = {"ascending" | "descending"}

data-type = {"text" | "number"}

case-order = {"upper-first" | "lower-first"}


After a style sheet is created, using it to transform a document is a breeze. As shown by the following code, applying the XslTransform class is straightforward. After creating an instance of it, you use its Load method to specify the file containing the style sheet. The XslTransform.Transform method performs the transformation. This method has several overloads. The version used here requires an XpathDocument object that represents the XML document, as a parameter, and an XmlWriter parameter that designates where the output is written梐n HTML file in this case.


// Transform XML into HTML and store in movies.htm

XmlWriter writer = new 

      XmlTextWriter("c:\\movies.htm",Encoding.UTF8);

XslTransform xslt = new XslTransform();

XPathDocument xpd = new 

      XPathDocument("c:\\oscarwinners.xml");

xslt.Load("movies.xsl");

xslt.Transform(xpd, null, writer,null);


Core Note

You can link a style sheet to an XML document by placing an HRef statement in the XML document on the line preceding the root element definition:


<?xml:stylesheet type="text/xsl" href="movies.xsl" ?>


If a document is linked to a style sheet that converts XML to HTML, most browsers automatically perform the transformation and display the HTML. This can be a quick way to perform trial-and-error testing when developing a style sheet.


It takes only a small leap from this simple XSLT example to appreciate the potential of being able to transform XML documents dynamically. It is a natural area of growth for Web Services and Web pages that now on demand accept input in one format, transform it, and serve the output up in a different format.

    Previous Section  < Day Day Up >  Next Section