The open source framework for converting and checking documents

There are risks involved in choosing a particular software product. Will the manufacturer still exist in five years’ time? Will the manufacturer continue to develop the product? Can other providers service the installation? Is it possible to have the suitability of the product independently verified?

le-tex’s decision, firstly to work with accepted standard technologies, and secondly to make the products open source, removes all these risks for the customer. This applies to converters from / to XML-based formats such as .docx, IDML, EPUB, HTML, DocBook, TEI and NLM / JATS / BITS and extensions, e. g., in order to check PDF and image files.

le-tex transpect not only involves the open source modules, but also the methodology for combining them into workflows. Examples of these workflows include conversion from InDesign (IDML) to publisher-specific XML and then to EPUB, or the conversion of Word manuscripts (.docx) to XML and then to IDML for a first coarse pagination in InDesign.

Key components of the le-tex transpect methodology are:

Configuration cascade

Different transformation rules and additional checking rules can be stored for each imprint, book series or even for each individual title.

Schematron and schema checking

with display of the relevant messages at the error location in an HTML view of the document. Schematron allows detection of errors that a DTD validation is unable to spot typesetting or editorial errors, such as unanchored marginal notes and illustrations, pseudo headings without proper styles, or uncited references.

The technologies behind these converters and checks are the W3C standards XSLT 2.0 and XProc and the ISO standards Schematron and Relax NG.

An example of a le-tex transpect-based converter solution suiting all publishers’ settings is shown in the figure above (see also the Hogrefe reference).

le-tex supports your installing and adapting of le-tex transpect to customer-specific requirements, and also helps integrating it into established processes. Training and support complete the offer. Thus, le-tex transpect guarantees that also in the future you will be able to abide with your approved typesetting service and all related workflows.

Even more, le-tex’s open source policy is taking a step forward: In principle, any other XML consultant can service and refine the converting application. This demonstrates le-tex’s philosophy to establish customer loyalty by technology leadership and service rather than by creating dependency from proprietary solutions.

While le-tex encourages customers to give back the code that they commissioned to the community, they’re absolutely not obliged to. transpect’s BSD licence allows them to keep customizations and enhancement closed, which allows transpect to be used in areas where trade secrets play a role.

Module Structure and Revision Control

Each transpect installation typically includes the required modules by means of revision control externals (svn externals or git submodules). Each transpect module will identify itself by its canonical URL. Whenever a module’s file is requested (for example, by XProc or XSLT import), it will be referenced by its canonical URL. Because of that, the actual physical location of each module within a project is less important, provided that there is a central catalog that imports the modules’ catalogs by means of nextCatalog. We call this robust installation mechanism “external+catalog import.”

What comes with revision controlled modules and the external mechanism is the possibility to fix certain module revisions so that production lines won’t suffer from incompatibilities introduced by module updates. The downside of this is that these production lines won’t benefit from updates unless someone increases the revision number to be used for that external. In order to autodetect incompatibilities in production lines, we are using a continuous integration server. For each producton line there may be a test data set. Whenever a dependency module is updated, the test data will be converted without that update and then with that update in place. If there are different outcomes, the project’s and the module’s maintainers will be informed and they will try to find a resolution: either fix the module or enhance other parts of the project’s pipelines so that the conversion output is the same again. Or they might simply determine that the outcome needed to change and that the applied change effected such change.


In our Subversion Repository, you will find a demo project and a setup manual (work in progress) for transpect projects.

Please consider the note about externals below when accessing the demo project link.

There is a simple Web interface where you can try out this and other demo applications without having to install them. As a sample file you may use the transpect white paper (in German).

In combination with le-tex aspect, you can access this interface’s roles / permissions management and project administration / progress monitoring functions. Eventually, the integration into your content management system is possible via an HTTP API.

XML Prague 2014 Slides

le-tex presented transpect at XML Prague 2014. See the slides here.

Detailed List of Modules and Standalone Tools

We are currently hosting the modules on our own Subversion server. In order to facilitate contributions from outside our company, we are considering moving everything to github in the near future.

Please note: The links to standalone tools and demo applications point to our Subversion server. You can browse the repositories, but you don’t see the externals for each standalone tool / project. In order to check out a project, you have to use something like:

svn checkout docx_modify

Module Standalone Tool? Configurable? Remark
calabash yes (Bash, .bat) yes (Extensions, XML catalogs, Saxon edition) Ready-to-run version of the XProc processor XML Calabash, with libraries for unzipping, Relax NG validation with XPath error locations, image resolution/color space/… detection. Is included via svn external in standalone projects and makes each project’s XML catalog entries available to the pipelines.
Container format→Hub XML
idml2xml yes (Calabash, Make) no Convert IDML to Hub XML. If the IDML contains InDesign’s piggybacked XML tagging, it may alternatively extract this XML.
docx2hub yes (Calabash, Make) no Convert OOXML (.docx) to Hub XML
epub2hub yes (Calabash, Make, Bash, .bat) no (to a limited extent with XSLT) Convert EPUB to Hub XML (uses html2hub)
docx_modify yes (Calabash, Bash) yes (XSLT, XProc) Manipulate .docx files with XSLT, focusing on transforming the content rather than creating the zip structure. Sample applications include XSLT for replacing non-Unicode Symbol or Windings characters with their Unicode equivalents from Arial Unicode MS (or from another font, subject to your template modifications).
hub2docx yes (Calabash, Bash) yes (XSLT, XProc, .docx template; cascading) Convert Hub XML to OOXML (.docx) (uses docx_modify)
xml2idml yes yes (XSLT, XProc, CSS, IDML template; cascading) IDML synthesis (typically via XHTML+CSS preview)
epubtools yes (Calabash, Make) yes (CSS, heading/splitting configuration; cascading) EPUB generator (Input: XHTML). Generates EPUB 2 or 3; is able to split the HTML files dynamically according to each unit’s text length; will generate a usable names for files with a query string in their URL; creates EPUBs with Audio-Overlays, …
htmltemplates no yes (cascading) HTML templates for assembling e-books from boilerplate text, body content and metadata. A link <a rel="transclude" href="#…"> processes the template with the referenced ID, <a rel="calc" name="…"> invokes an <xsl:call-template name="…">. The latter mechanism is good for iterations, conditional text or rendering of metadata. It is possible to call named HTML templates from the named XSLT templates, handing over control back to the HTML mechanism. Unlike the XSLT templates, the HTML templates may be maintained by ordinary users, in a text editor or in a future Web-based EPUB configurator.
css-expand yes (Calabash) yes (XSLT, XProc, .docx-Vorlage; cascading) CSS→XML parser and conversion from CSS to CSSa. The standalone tool is an example for checking CSSa properties with Schematron you may supply your own There’s a Schematron.
css-generate yes (Calabash, Bash) yes (XSLT, XProc, .docx template; cascading) CSS serialization from an XML representation
hub2html no yes (XSLT, XProc; cascading) Convert Hub XML to HTML. This module’s XSLT will also be used by other modules for filtering and transforming CSSa layout information (for example, creating <b> or <i> wrapping elements from css:font-weight or css:font-style attributes, or ignoring boldface CSSa in title elements).
html2hub yes (Calabash, Make, Bash, .bat) no (limited through XSLT) Convert HTML to Hub XML.
html-tables yes (Calabash) no This implementation of Andrew Welch’s HTML table normalizing algorithm inserts the cell origin coordinates on the physical grid as data-… attributes.
letex-util no no XSLT functions for converting between lengths, colors, and hex numbers, for mapping file extensions to MIME types and for normalizing CALS/OASIS tables (analogous to html-tables)
Hierarchize Hub XML
evolve-hub no yes (XSLT, XProc; cascading) Use style information (style names, in particular) for up-converting flat Hub XML as it stems from IDML or .docx. Apart from sectional hierarchies, it deals with nesting lists, grouping tables and graphics with their captions, etc.
Core XProc libraries
pubcoach no yes (configuration file, XSLT – import & extend, XProc) Central XProc/XSLT library that implements the configuration cascade. Apart from that: diverse utilities such as Schematron or Relax NG validations that store a “@srcpath” attribute with every error message. @srcpath serves as a key to attach the messages of many conversion stages to a rendering of the source document. “pubcoach” is for “PUBlishing COnversion And CHecking”, which used to be the framework’s name before it was dubbed “transpect.”
xproc-util no no Libraries for linear, one-mode-per-pass XSLT conversion pipelines, for saving debugging snapshots after each conversion pass and for associating Schema information with documents (<?xml-model?>).
hub2bits no yes (XSLT, XProc; cascading) Convert Hub XML to JATS/BITS/HoBoTS XML. Uses hub2html’s CSSa wrapper/mapper.
hub2tei no yes (XSLT, XProc; cascading) Convert Hub XML to TEI. Uses hub2html’s CSSa wrapper/mapper.
jats2html no yes (XSLT, XProc; cascading) Convert JATS/BITS/HoBoTS XML to HTML. Uses hub2html’s CSSa wrapper/mapper.
tei2html no yes (XSLT, XProc; cascading) Convert TEI to HTML. Uses hub2html’s CSSa wrapper/mapper.
schematron-stdlib no yes (cascading) Typical Schematron checks, for ex. for unanchored text frames in IDML or for styles that are in the document but not in the template. These checks may be imported and overridden by actual transpect projects.
htmlreports no no Patch SVRL (and SVRL that is derived from Relax NG validation, by finding the closest @srcpath for a given error location) into an HTML rendering, based on a common @srcpath attribute.
idmlval yes (Calabash) no Validation against the IDML Relax NG Schemas, particularly for synthesized IDML.
schema no no Schema collection (XHTML, JATS, TEI, …) for external+catalog inclusion.
hub2fo no no Convert Hub XML to XSL-FO (based on the contained CSSa)CSSa)
hub2tex no yes (mapping configuration file) Convert Hub to LaTeX.
fontlib no no Open source fonts that may be included via their canonical URL from CSS. They will then be included by epubtools (thanks to css-expand).
font-maps no no Mapping from non-Unicode fonts (Symbol, Wingdings, …) to Unicode, for inclusion from docx2hub, idml2xml, etc.
crossref yes (Calabash, Make, fetchmail/procmail) yes (XSLT) Submit CrossRef batch queries for JATS/BITS/HoBoTS XML and generate scripts that augment the original InDesign files with the retreived DOIs.


Martin Kraetke
Phone: +49 341 355356 143
Fax: +49 341 355356 543

Gerrit Imsieke
Phone: +49 341 355356 110
Fax: +49 341 355356 510