1.2 The XML Family of Standards
XML was specifically
designed to combine the flexibility of SGML with the simplicity of
Hypertext Markup Language (HTML). HTML, the markup language upon
which the World Wide Web is based, is an application of an older and
more complex language known as Standard Generalized Markup Language
(SGML). SGML was created to provide a standardized language for
complex documents, such as airplane repair manuals and parts lists.
HTML, on the other hand, was designed for the specific purpose of
creating documents that could be displayed by a variety of different
web browsers. As such, HTML provides only a subset of
SGML's functionality and is limited to features that
make sense in a web browser. XML takes a broader view.
There are several types of tasks
you'll typically want to perform with XML documents.
XML documents can be read into arbitrary data structures, manipulated
in memory, and written back out as XML. Existing objects can be
written (or serialized, to use the technical term) to a number of
different XML formats, including ones that you define, as well as
standard serialization formats. The technologies most commonly used
to perform these operations are the following:
In order to read an XML Document into memory, you need to
read it. There are a variety of XML parsers that
can be used to read XML, and I discuss the .NET implementation in
After either reading XML in or creating an XML representation in
memory, you'll most likely need to
write it out to an XML file. This is the flip
side of parsing, and it's covered in Chapter 3.
You can use the same APIs you use to read and write XML to read and
write other formats. I explore how this works in Chapter 4.
Once it has been read into memory, you can
manipulate an XML document's
tree structure through the Document Object Model (DOM). The DOM
specification was developed to introduce a platform-independent model
for XML documents. The DOM is discussed in Chapter 5.
You will sometimes want to
locate a particular element or attribute in the content of an XML
document. The XPath specification provides the mechanism used to
navigate an XML document. I talk about XPath in
Different organizations often
develop different markup languages for the same problem domain. In
those cases, it can be useful to transform an
existing XML document in one format into another document in another
format. XML Stylesheet Language Transformations (XSLT) was developed
to enable you to convert XML documents into other XML and non-XML
formats. XSLT is discussed in Chapter 7.
- XML Schema
original XML specification included the Document Type Description
(DTD), which allows you to specify the structure of an XML document.
The XML Schema standard allows you to constrain
an XML document in a more formal manner than DTD. Using an XML
Schema, you can ensure that a document structure and content fits the
expected model. I discuss XML Schema in Chapter 8.
addition to the XML technologies listed above, there are specific XML
syntaxes used for specific purposes. One such purpose is
serializing objects into XML. Objects can be
serialized to an arbitrary XML syntax, or they can be serialized to
the Simple Object Access Protocol (SOAP). I discuss serialization in
- Web Services
Web Services allows
for the sharing of resources on a network as if
they were local through XML syntaxes such as SOAP, Web Services
Definition Language (WSDL), and Universal Description, Discovery, and
Integration (UDDI). Web Services provides the foundation for .NET
remoting, although Web Services is, by its nature, an open framework
that is operating system- and hardware-independent. Although Web
Services as a topic can fill several volumes, I talk about it briefly
in Chapter 10.
Most modern software applications
are concerned in some way with storing and
accessing data. While XML can itself be used as a rudimentary data
store, relational database management systems, such as SQL Server,
DB2, and Oracle, are much better at providing quick, reliable access
to large amounts of data. Like Web Services, database access is a
huge topic; I'll try to give you a taste for
XML-related database access issues in Chapter 11.
Since its invention, XML has gone far beyond the language for web
site design that HTML is. It has acquired a host of related
technologies, such as XHTML, XPath, XSLT, XML Schema, SOAP, WSDL, and
UDDI, some of which are syntaxes of XML, and some of which simply add
value to XML梐nd some of which do both.
I've just introduced a lot of acronyms, so look at
Figure 1-2 for a visual representation of the
relationships between some of these standards.
Figure 1-2. SGML and its progeny