XPath, which is short for XML Path Language, is a language for addressing parts of an XML document. Its name includes the word “path” because of the similarities between XML paths and file system paths. In a file system, for example, \Book\Chap13 identifies the Chap13 subdirectory of the root directory’s Book subdirectory. In an XML document, /Guitars/Guitar identifies all elements named Guitar that are children of the root element Guitars. “/Guitars/Guitar” is an XPath expression. XPath expressions are fully described in the XPath specification found at http://www.w3.org/TR/xpath.

XPath can be put to work in a variety of ways. Later in this chapter, you’ll learn about XSL Transformations (XSLT), which is a language for converting XML documents from one format to another. XSLT uses XPath expressions to identify nodes and node sets. Another common use for XPath is extracting data from XML documents. Used this way, XPath becomes a query language of sorts—the XML equivalent of SQL, if you will. The W3C is working on an official XML query language called XQuery (http://www.w3.org/TR/xquery), but for the moment, an XPath processor is the best way to extract information from XML documents without having to manually traverse DOM trees. The FCL comes with an XPath engine named System.Xml.XPath.XPathNavigator. Before we discuss it, let’s briefly review XPath.

XPath Basics

Expressions are the building blocks of XPath. The most common type of expression is the location path. The following location path evaluates to all Guitar elements that are children of a root element named Guitars:


This one evaluates to all attributes (not elements) named Image that belong to Guitar elements that in turn are children of the root element Guitars:


The next expression evaluates to all Guitar elements anywhere in the document:


The // prefix is extremely useful for locating elements in a document regardless of where they’re positioned.

XPath also supports wildcards. This expression selects all elements that are children of a root element named Guitars:


The next example selects all attributes belonging to Guitar elements anywhere in the document:


Location paths can be absolute or relative. Paths that begin with / or // are absolute because they specify a location relative to the root. Paths that don’t begin with / or // are relative paths. They specify a location relative to the current node, or context node, in an XPath document.

The components of a location path are called location steps. The following location path has two location steps:


A location step consists of three parts: an axis, a node test, and zero or more predicates. The general format for a location step is as follows:


The axis describes a relationship between nodes. Supported values include child, descendant, descendant-or-self, parent, ancestor, and ancestor-or-self, among others. If you don’t specify an axis, the default is child. Therefore, the expression


could also be written


Other axes can be used to qualify location paths in different ways. For example, this expression evaluates to all elements named Guitar that are descendants of the root element:


The next expression evaluates to all Guitar elements that are descendants of the root element or are themselves root elements:


In fact, // is shorthand for /descendant-or-self. Thus, the expression


is equivalent to the one above. Similarly, @ is shorthand for attribute. The statement


can also be written


Most developers prefer the abbreviated syntax, but both syntaxes are supported by XPath 1.0–compliant expression engines.

The predicate is the portion of the location path, if any, that appears in square brackets. Predicates are nothing more than filters. For example, the following expression evaluates to all Guitar elements in the document:


But this one uses a predicate to narrow down the selection to Guitar elements having attributes named Image:


The next one evaluates to all Guitar elements that have attributes named Image whose value is “MyStrat.jpeg”:

//Guitar[@Image?= "MyStrat.jpeg"]

Predicates can include the following comparison operators: <, >, =, !=, <=, and >=. The following expression targets Guitar elements whose Year elements designate a year after 1980:


Predicates can also include and and or operators. This expression selects guitars manufactured after 1980 by Fender:

//Guitar[Year?>?1980][Make?= "Fender"]

The next expression does the same, but combines two predicates into one using the and operator:

//Guitar[Year?>?1980?and?Make?= "Fender"]

Changing and to or identifies guitars that were manufactured by Fender or built after 1980:

//Guitar[Year?>?1980?or?Make?= "Fender"]

XPath also supports a set of intrinsic functions that are often (but not always) used in predicates. The following expression evaluates to all Guitar elements having Make elements whose text begins with the letter G. The key is the starts-with function invoked in the predicate:

//Guitar[starts-with?(Make, "G")]

The next expression uses the text function to return all text nodes associated with Make elements that are subelements of Guitar elements. Like DOM, XPath treats the text associated with an element as a separate node:


The starts-with and text functions are but two of many that XPath supports. For a complete list, refer to the XPath specification.

When executed by an XPath processor, a location path returns a node set. XPath, like DOM, uses tree-structured node sets to represent XML content. Suppose you’re given the XML document in Figure 13-3 and you execute the following location path against it:


The resulting node set contains two nodes, each representing a Guitar element. Each Guitar element is the root of a node tree containing Make, Model, Year, Color, and Neck subelement nodes (Figure 13-9). Each subelement node is the parent of a text node that holds the element’s text. XPath node types are defined separately from DOM node types, although the two share many similarities. XPath defines fewer node types than DOM, which make XPath node types a functional subset of DOM node types.

Figure 13-9
Node set resulting from an XPath expression.
XPathNavigator and Friends

The .NET Framework class library’s System.Xml.XPath namespace contains classes for putting XPath to work in managed applications. Chief among those classes are XPathDocument, which represents XML documents that you want to query with XPath; XPathNavigator, which provides a mechanism for performing XPath queries; and XPathNodeIterator, which represents node sets generated by XPath queries and lets you iterate over them.

The first step in performing XPath queries on XML documents is to create an XPathDocument wrapping the XML document itself. XPathDocument features a variety of constructors capable of initializing an XPathDocument from a stream, a URL, a file, a TextReader, or an XmlReader. The following statement creates an XPathDocument object and initializes it with the content found in Guitars.xml:


Step two is to create an XPathNavigator from the XPathDocument. XPathDocument features a method named CreateNavigator for just that purpose. The following statement creates an XPathNavigator object from the XPathDocument created in the previous step:


The final step is actually executing the query. XPathNavigator features five methods for executing XPath queries. The two most important are Evaluate and Select. Evaluate executes any XPath expression. It returns a generic Object that can be a string, a float, a bool, or an XPathNodeIterator, depending on the expression and the type of data that it returns. Select works exclusively with expressions that return node sets and is therefore an ideal vehicle for evaluating location paths. It always returns an XPathNodeIterator representing an XPath node set. The following statement uses Select to create a node set representing all nodes that match the expression “//Guitar”:


XPathNodeIterator is a simple class that lets you iterate over the nodes returned in a node set. Its Count property tells you how many nodes were returned:


XPathNodeIterator’s MoveNext method lets you iterate over the node set a node at a time. As you iterate, XPathNodeIterator’s Current property exposes an XPathNavigator object that represents the current node. The following code iterates over the node set, displaying the type, name, and value of each node:


The string returned by the XPathNavigator’s Value property depends on the node’s type and content. For example, if Current represents an attribute node or an element node that contains simple text (as opposed to other elements), then Value returns the attribute’s value or the text value of the element. If, however, Current represents an element node that contains other elements, Value returns the text of the subelements concatenated together into one long string.

Each node in the node set that Select returns can be a single node or the root of a tree of nodes. Traversing a tree of nodes encapsulated in an XPathNavigator is slightly different from traversing a tree of nodes in an XmlDocument. Here’s how to perform a depth-first traversal of the node trees returned by XPathNavigator.Select:




XPathNavigator features a family of Move methods that you can call to move any direction—up, down, or sideways—in a tree of nodes. This sample uses five of them: MoveToFirstAttribute, MoveToNextAttribute, MoveToParent, MoveToFirstChild, and MoveToNext. Observe also that the XPathNavigator itself exposes the properties of the nodes that you iterate over, in much the same manner as XmlTextReader.

So how might you put this knowledge to work in a real application? Look again at Figure 13-5. The application listed there uses XmlDocument to extract content from an XML document. Content can also be extracted—often with less code—with XPath. To demonstrate, the application in Figure 13-10 is the functional equivalent of the one in Figure 13-5. Besides demonstrating the basic semantics of XPathNavigator usage, it shows that you can perform subqueries on node sets returned by XPath queries by calling Select on the XPathNavigator exposed through an iterator’s Current property. XPathDemo first calls Select to create a node set representing all Guitar elements that are children of Guitars. Then it iterates through the node set, calling Select on each Guitar node to select the node’s Make and Model child elements.





Figure 13-10
Utility that uses XPath to extract XML content.
A Do-It-Yourself XPath Expression Evaluator

To help you get acquainted with XPath, the application pictured in Figure 13-11 is a working XPath expression analyzer that evaluates XPath expressions against XML documents and displays the results. Like Microsoft SQL Server’s query analyzer, which lets you test SQL commands, the XPath expression analyzer—Expressalyzer for short—lets you experiment with XPath queries. To try it out, type a file name or URL into the Document box and click Load to point Expressalyzer to an XML document. Then type a location path into the Expression box and click the Execute button. The results appear in the tree view control in the lower half of the window.

Figure 13-11
Windows Forms XPath expression analyzer.

Expressalyzer’s source code appears in Figure 13-12. Expressalyzer is a Windows Forms application whose main form is an instance of AnalyzerForm. Clicking the Load button activates the form’s OnLoadDocument method, which wraps an XPathDocument around the data source. Clicking the Execute button activates the OnExecuteExpression method, which executes the expression by calling Select on the XPathDocument. If you need more real estate, resize the Expressalyzer window and the controls inside it will resize too. That little piece of magic results from the AnchorStyles assigned to the controls’ Anchor properties. For a review of Windows Forms anchoring, refer to Chapter 4.




????????Text?= "XPath?Expression?Analyzer";


????????Source.Name?= "Source";

????????LoadButton.Text?= "Load";

????????DocumentGB.Text?= "Document";

????????Expression.Name?= "Expression";

????????ExecuteButton.Text?= "Execute";

????????ExpressionGB.Name?= "ExpressionGB";
????????ExpressionGB.Text?= "Expression";


????????XmlView.Name?= "XmlView";












????????????????text?=?text.Substring?(0,?128)?+ "...";




Figure 13-12
Source code for an XPath expression analyzer.