Is your XML well-formed and valid?

author-image
CIOL Bureau
Updated On
New Update

Extensible Markup Language (XML) is a W3C standard for encoding data in platform-
and application-independent way. XML makes it possible to publish and exchange data on the
Web and across diverse applications. This article briefly describes the XML as per the href="http://www.w3.org/TR/REC-xml">XML 1.0 specification of the W3C.

Elements and Attributes

To create a list of players in the Indian cricket Team using HTML, we would probably use
the

    or tags. Further, tags may be used
    to make it more visually appealing. A program that has to extract meaningful data from the
    HTML file must perform very complex parsing and sift through a lot of presentation tags.

In XML, you can define your own elements and attributes
thereby describing your data more accurately. XML data has semantic structure. We have
used the , , ,
, and elements to partially describe our national cricket team. The

element has two
attributes: batsman and bowler.


version="1.0" encoding="UTF-8"

standalone="yes"?>




Sachin


26

78




Anil Kumble


28

10

Well-formed XML

Because XML is a highly structured language, all documents must strictly conform
to certain rules as per the XML 1.0 specification. Some of these rules are:

- Every document must have a single
unique root element that encloses all other elements within it. In our example,
is the root tag.

- All elements must have
corresponding start and end tags. Note that XML elements are case sensitive.

- They must be cleanly nested and not
overlap, e.g. is incorrect nesting.

- Attribute values must be enclosed
within double quotes.

The moment an XML parser detects a violation, the
processing ceases. Compare this to an HTML parser (in a Web browser) that ignores syntax
errors.

An XML document that meets all structural criteria is said
to be well-formed. The necessary condition for a document to qualify as XML is
that it should be well-formed.

Valid XML

If data must be exchanged between applications reliably, the structure of the
.xml
file must be known in advance. The Document Type Definition (DTD) is like a template
against which an XML document is validated for accuracy and conformance to an agreed
standard.

The DTD for our cricket team data file would look something
like this:


TEAM (MEMBER+)>



WICKETS?)>











CDATA #IMPLIED>

The DTD clearly lists out the structure of
the document in terms of its elements and attributes. It specifies which elements (or
attributes) are optional (those with an appended "?") and which aren"t, and
whether an element contains more elements or just text.

An XML document that conforms to a DTD is said to be valid.
Validity is a sufficient condition for an XML 1.0 document, i.e. all valid XML documents
are well-formed (and hence conform to the W3C XML 1.0 specification), but being
well-formed alone does not make a document valid.

Applied XML

MathML is a fine application of XML. The MathML DTD is an international standard
that defines a common format for storing complex mathematical equations in
XML. How the
equation is rendered depends on the application. It can be displayed in a word processor,
mathematically operated on in a spreadsheet, or spoken out by a voice synthesis software.

The Synchronised Multimedia Integration Language (SMIL)
allows multimedia content providers to integrate independent multimedia objects into a
synchronised presentation. Again, an SMIL presentation is described in terms of an SMIL
DTD.

A single format for data encoding

Different application software use their own method of recording data. As a
result there is a proliferation of proprietary data formats. With XML diverse applications
will have a standard way of encoding and sharing data.

Design Goals for XML
One of the key design goals for XML is
that it shall be directly usable over the Internet and support wide variety of
applications. XML is a much simpler subset of SGML that is more readable, formal, and
concise. Creating data files in XML is a simple process.

A previously published article
on CIOL gives the historical background of SGML, HTML and XML.

tech-news