XML

The XML logo of the World Wide Web Consortium (W3C)

XML is one of the most widely used technologies for tagging publications. For many companies, XML is essential for their publication processes. Today, many publications and their metadata are stored in XML in a media-neutral way, i.e. regardless of their subsequent intended use. The XML data can then be easily converted into another form using programming languages such as XSLT or Ruby, e.g. a print template, an e-book or a website (single source publishing).

XML is an abbreviation for Extensible Markup Language, i.e. it is a markup language that can be extended. XML is text-based and simultaneously readable by machines and humans and can be used anywhere, regardless of operating system and platform. Here is a very simple example of marking up a bookshop range with XML:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book>
    <title>Waiting for Godot</title>
    <authors>
      <person>Samuel Beckett</person>
    </authors>
    <cover href="beckett-godot.jpg"/>
  </book>
</bookstore>

The first line contains the XML declaration which states that it is an XML file. The XML syntax essentially consists of so-called elements and attributes. The content is marked with elements, which are names in angle brackets that frame the content. XML is structured hierarchically: The elements can contain further elements, e.g. the element <book> includes the elements <title>, <authors> and <envelope>. Attributes can be used to give the elements additional properties, e.g. for the <envelope> element, the href attribute describes the file name of the cover file.

XML has the advantage of many other markup languages in that the structure of elements and attributes can be defined using a schema. To stick with our example, a schema could stipulate that the element <envelope> may only appear in a <book> element, there only once and that it must contain a href attribute. Schema technologies such as RelaxNG or XML Schema can be used to describe schemas and validate them with the appropriate tools.

Frequently used schemas are TEI (Text Encoding Initiative) for critical editions in the humanities, JATS (Journal Article Tag Suite) for scientific articles or NISO STS for technical standards. However, XML also plays an important role for smaller-scale data: ONIX can be used to describe and exchange bibliographic metadata and SVG and MathML are the most important formats for displaying vector graphics and mathematical formulas on the World Wide Web.

XML is used in many industries and technologies. Many well-known office and publishing applications are also based on XML or have XML interfaces: Microsoft Word and Excel data formats and Adobe InDesign's IDML (InDesign Markup Language) are all based on XML. XML editors can also be used to create publications directly in XML.

XML was officially published as a W3C standard back in 1998. Since then, it has spread rapidly, but has also faced competition from more lightweight formats such as JSON and MarkDown. Nevertheless, for many publications, there is no markup language that is more efficient, robust and versatile than XML.