Data checking

In the majority of cases, the most expensive aspect of typesetting is data normalization – all the more so if electronic products are to be generated in addition to a print product. To save normalization and conversion costs, the data should be checked at several stages of the process.

Manuscript data

While pre-flight checks for print PDFs are self-evident at the end of the process, suitable tools are often lacking at the start. Many publishers have their own Word templates and XML DTDs. Sometimes, the Word template has already been checked for convertibility to the XML structure.

However, there is always a conflict of objectives when creating document templates with checking mechanisms: On the one hand, you wish to prevent certain things like, for instance, omission of a section hierarchy level, while on the other, it must be possible to mark up a broad range of manuscripts with the same template, and there are always some exceptions. The template must therefore be neither too strict nor too liberal.

Manuscript markup requirements can also vary from series to series, but you do not want to have to maintain a different template version for each series. It may also become necessary to adapt or create new rules as time goes by. The problem with this is that you have to persuade authors using the old template to replace it with the new one.

Ultimately, it is often the case that no one checks whether the rules have actually been obeyed and even whether the authors have used the template at all. The data is sent unchecked to the service provider, which then leads to normalization costs.

In our experience, it is often advisable to allow authors to work with a highly generic Word template and to check compliance with guidelines, including publication-specific ones, on the server side after uploading of the file. Errors and warnings can be indicated by comments in the Word file (see screenshot for DIN standards checking). Even if you would rather spare the authors from this kind of checking, it is nonetheless useful for production editors, copy editors, or editorial managers at the publishing company because it facilitates quality checking of one's own work and provides a traceable basis for time and cost estimates in the case of external production.

Digitized data

le-tex often assigns standardized and cost-sensitive digitization jobs with a specified minimum size to sub-contractors from Central and Eastern Europe or Asia. Quality assurance of table entry was for a long time particularly difficult, as the service providers first had to familiarize themselves with the different XML dialects of the customers (e.g. with regard to the permissible inline markup of CALS tables) and frequently did not have tools for visual quality assurance of the XML dialects.

For entering new data, le-tex therefore uses XHTML tables, although these need to comply with additional strict rules so that they can be converted unambiguously into the relevant XML target formats. Any service provider can check compliance with these rules with a freely available Schematron checking script. XHTML also allows direct visual quality control in the browser.

Similar import checks exist for image and LaTeX data. The main difference from manuscript checking is that it is possible to go into more technical detail with service providers than with authors.

Typesetting data

If high-quality XML files or EPUBs are to be created automatically from InDesign data supplied by a typesetting team, the data must fulfil certain criteria e.g. with regard to the use of style sheets.

le-tex provides online checking services for this too. These are adapted to the customer's individual workflows and supplement the print-related pre-flight checks with a structurability check. They therefore represent an alternative to InDesign XML workflows that are usually more expensive than conventional typesetting and require compromises in many places.

Martin Kraetke

✉martin.kraetke@le-tex.de

Tel: +49 341 355356 143

Fax: +49 341 355356 543

Data checking

Manuscript data

Digitized data

Typesetting data

Projects

Related Topics