Schematron

It is frequently not sufficient to simply validate XML data against a DTD, particularly if the DTD (or XML schema) is too generic. On the other hand, a DTD that is too finely differentiated may fail to display many eventualities that occur in practise and which ought to be displayed. This latter case leads to elements or attributes of the DTD being used for purposes other than intended, or authors of the documents no longer paying attention to validation. This is highly unsatisfactory, because validation against a DTD or schema produces the result 1 or 0: Valid or not valid. With these grammar-based checks, it is not possible to represent the fact that a validity infringement has different effects that may be quite harmless for a particular purpose. A further weakness of DTDs is the fact that they do not allow any conditional relationships to be checked (If the ISBN starts with 978-4-711, an English translation of the title and the blurb must be included).

This is where Schematron rules come in, as an alternative or supplement to DTD. These rules can be flexibly set up for any XML document types (including XML formats of Word and InDesign). For example, in the case of data that is otherwise invalid for DTD, they allow detection of the issues in the data that are known to make EPUB publishing or automatic page makeup impossible. In addition, warnings can be generated if a particular vocabulary or particular abbreviations appear, or if these do not appear at all.

Schematron always requires XSLT or at least XPath, because the conditions are formulated in XPath and checked with an XSLT processor. Schematron is particularly powerful in combination with an XSLT 2 processor like saxon. Only then can comprehensive analyses of .docx or IDML documents be carried out efficiently. The Schematron messages, which can incidentally be formulated in natural language, can be written back to the checked data again via XSLT as comments. This enables checks of manuscript and typesetting data as under Data checking.