[ Team LiB ] Previous Section Next Section

XML Overview

If you're not already familiar with working with XML, you may find all of the acronyms a bit confusing at first. However, XML syntax itself is fairly easy to understand.

The XML file

The first line of an XML file is the XML declaration, which specifies that the file is an XML document, that it conforms to the XML version 1.0 specification, and that it uses the UTF-8 character set. Most XML documents have this declaration, but Access is also capable of importing XML documents that do not:

<?xml version="1.0" encoding="UTF-8" ?>

The body of the XML file consists of tags similar to the tags used in HTML. Start tags begin with open angle brackets and end with closing angle brackets:

<Car>

End tags begin with an open angle bracket and a slash, and end with a closing angle bracket:

</Car>

The Car tag is also the name of the element. While HTML works with a limited set of elements, XML allows you to create your own, as long as you conform to some basic rules:

  • Names can contain only alphanumeric characters, the underscore character (_), hyphens (-), or a period (.).

  • Element names cannot contain white space and must start with a letter or the underscore character.

The values in XML elements are found between the start tag and end tag, similarly to the way that text is represented in HTML. In this example, the Car element has a value of Mini Cooper:

<Car>Mini Cooper</Car>

XML elements can be nested, but they can't overlap. The Car element can have sub-elements, such as Make, Model and Price:

<Car>
  <Make> Mini Cooper</Make>
  <Model>S</Model>
  <Price>$20,000</Price>
</Car>

Note that spaces, tabs and line feeds are ignored by the XML parser. They are used to make XML documents more readable.


You can also have multiple nested sets of elements in the same XML file, and elements can be repeated:

<Car>
  <Make> Mini Cooper</Make>
  <Model>S</Model>
  <Price>$20,000</Price>
</Car>
<Car>
  <Make> Lexus</Make>
  <Model>LS430</Model>
  <Price>$60,000</Price>
</Car>
Root elements and namespaces

The above sample alone would not comprise a valid XML file. Each valid XML document must have a single root, or top-level, element. This allows the XML file to be represented as a tree, with all of the elements as branches off of the main root element. In this example, the starting tag is named dataroot, and has a namespace declaration:

<?xml version="1.0" encoding="UTF-8" ?>
<dataroot xmlns:od="urn:schemas-microsoft-com:officedata">
  <Car>
    <Make> Mini Cooper</Make>
    <Model>S</Model>
    <Price>$20,000</Price>
  </Car>
  <Car>
    <Make> Lexus</Make>
    <Model>LS430</Model>
    <Price>$60,000</Price>
  </Car>
</dataroot>

There are three parts to the namespace declaration:

  • xmlns: identifies the dataroot element as containing an XML namespace.

  • od: identifies the prefix assigned to the namespace.

  • "urn:schemas-microsoft-com:officedata" is the Uniform Resource Identifier, or URI, which uniquely identifies the namespace. This particular namespace is generated whenever you save Access data in XML format.

In this example, all of the elements in the document are part of one namespace, but multiple namespaces can be used in a single XML document. In that case, the prefix assigned to each namespace is used with the element names to identify which namespace they belong to. This allows differentiation between identically named elements from different namespaces.

When you view an XML file in a browser, you can see the hierarchy of data, as shown in Figure 18-1.

Figure 18-1. Viewing the XML file in a browser window
figs/acb2_1801.gif

Clicking the plus sign (+) expands the tree view so that you can view the data in the nested elements.

Attributes

Another option is to represent the data using attributes in addition to elements. Each attribute has a name and a value, as shown in this example where each Car element has a Make, Model, and Price attribute:

<?xml version="1.0" encoding="UTF-8" ?> 
<dataroot xmlns:od="urn:schemas-microsoft-com:officedata">
  <Car Make="Mini Cooper" Model="S" Price="$20,000" /> 
  <Car Make="Lexus" Model="LS430" Price="$60,000" /> 
</dataroot>

You can represent the data as either elements or attributes. However, when you import or export XML data with Access, you have no choice—you must use elements, not attributes, for Access to be able to correctly parse the XML file. One major problem is that if your XML input is not structured using elements, then you may not like the way that Access imports the data. To get around this problem, you need to convert your XML to the element-based format that Access expects. To get around this limitation, you can use an XML technology named Extensible Stylesheet Language Transformations, or XSLT.

Extensible Stylesheet Language Transformations (XSLT)

XSLT is an XML-based language for transforming an XML document into another form. The result can be another XML document or any type of text document. XSLT combines some procedural language features along with rule-based language features. XSLT stylesheets are XML documents that define templates and how to apply them. The templates in XSLT documents contain rules for matching XML elements and attributes in the document that is being transformed and instructions for reformatting those elements and attributes. You will often hear XSLT stylesheets referred to as "XSLT transforms," or simply "transforms." In Access 2003, you can use XSLT for transforming XML both when importing and when exporting data.

XML Schema Definition (XSD)

XSD provides a way of describing the structure of data contained in an XML file, as well as constraints applied to the data, including data types. This is similar to the table definitions and relationships you use to define data structure in Access.

When you export data, you can have Access generate a schema, or XSD, file that describes the data. When importing XML, you can import an XSD file to define the structure and data types of the data being imported. When you import XSD files, Access creates tables based on the definitions in the files.

    [ Team LiB ] Previous Section Next Section