The open source framework for converting and checking documents
There are risks involved in choosing a particular software product. Will the manufacturer still exist in five years’ time? Will the manufacturer continue to develop the product? Can other providers service the installation? Is it possible to have the suitability of the product independently verified?
le-tex’s decision, firstly to work with accepted standard technologies, and secondly to make the products open source, removes all these risks for the customer. This applies to converters from / to XML-based formats such as .docx, IDML, EPUB, HTML, DocBook, TEI and NLM / JATS / BITS and extensions, e. g., in order to check PDF and image files.
le-tex transpect not only involves the open source modules, but also the methodology for combining them into workflows. Examples of these workflows include conversion from InDesign (IDML) to publisher-specific XML and then to EPUB, or the conversion of Word manuscripts (.docx) to XML and then to IDML for a first coarse pagination in InDesign.
Key components of the le-tex transpect methodology are:
- Configuration cascade
Different transformation rules and additional checking rules can be stored for each imprint, book series or even for each individual title.
- Schematron and schema checking
with display of the relevant messages at the error location in an HTML view of the document. Schematron allows detection of errors that a DTD validation is unable to spot typesetting or editorial errors, such as unanchored marginal notes and illustrations, pseudo headings without proper styles, or uncited references.
The technologies behind these converters and checks are the W3C standards XSLT 2.0 and XProc and the ISO standards Schematron and Relax NG.
An example of a le-tex transpect-based converter solution suiting all publishers’ settings is shown in the figure above (see also the Hogrefe reference).
le-tex supports your installing and adapting of le-tex transpect to customer-specific requirements, and also helps integrating it into established processes. Training and support complete the offer. Thus, le-tex transpect guarantees that also in the future you will be able to abide with your approved typesetting service and all related workflows.
Even more, le-tex’s open source policy is taking a step forward: In principle, any other XML consultant can service and refine the converting application. This demonstrates le-tex’s philosophy to establish customer loyalty by technology leadership and service rather than by creating dependency from proprietary solutions.
While le-tex encourages customers to give back the code that they commissioned to the community, they’re absolutely not obliged to. transpect’s BSD licence allows them to keep customizations and enhancement closed, which allows transpect to be used in areas where trade secrets play a role.
Module Structure and Revision Control
Each transpect installation typically includes the required modules by means of revision
(svn externals or git submodules). Each transpect module will identify itself by its
canonical URL. Whenever a
module’s file is requested (for example, by XProc or XSLT import), it will be referenced
by its canonical URL.
Because of that, the actual physical location of each module within a project is less
important, provided that
there is a central catalog that imports the modules’ catalogs by means of
nextCatalog. We call
this robust installation mechanism “external+catalog import.”
What comes with revision controlled modules and the external mechanism is the possibility to fix certain module revisions so that production lines won’t suffer from incompatibilities introduced by module updates. The downside of this is that these production lines won’t benefit from updates unless someone increases the revision number to be used for that external. In order to autodetect incompatibilities in production lines, we are using a continuous integration server. For each producton line there may be a test data set. Whenever a dependency module is updated, the test data will be converted without that update and then with that update in place. If there are different outcomes, the project’s and the module’s maintainers will be informed and they will try to find a resolution: either fix the module or enhance other parts of the project’s pipelines so that the conversion output is the same again. Or they might simply determine that the outcome needed to change and that the applied change effected such change.
Please consider the note about externals below when accessing the demo project link.
In combination with le-tex aspect, you can access this interface’s roles / permissions management and project administration / progress monitoring functions. Eventually, the integration into your content management system is possible via an HTTP API.
XML Prague 2014 Slides
Detailed List of Modules and Standalone Tools
We are currently hosting the modules on our own Subversion server. In order to facilitate contributions from outside our company, we are considering moving everything to github in the near future.
Please note: The links to standalone tools and demo applications point to our Subversion server. You can browse the repositories, but you don’t see the externals for each standalone tool / project. In order to check out a project, you have to use something like:
svn checkout https://subversion.le-tex.de/docxtools/trunk/docx_modify/frontend/
|calabash||yes (Bash, .bat)||yes (Extensions, XML catalogs, Saxon edition)||Ready-to-run version of the XProc processor XML Calabash, with libraries for unzipping, Relax NG validation with XPath error locations, image resolution/color space/… detection. Is included via svn external in standalone projects and makes each project’s XML catalog entries available to the pipelines.|
|Container format→Hub XML|
|idml2xml||yes (Calabash, Make)||no||Convert IDML to Hub XML. If the IDML contains InDesign’s piggybacked XML tagging, it may alternatively extract this XML.|
|docx2hub||yes (Calabash, Make)||no||Convert OOXML (.docx) to Hub XML|
|epub2hub||yes (Calabash, Make, Bash, .bat)||no (to a limited extent with XSLT)||Convert EPUB to Hub XML (uses html2hub)|
|docx_modify||yes (Calabash, Bash)||yes (XSLT, XProc)||Manipulate .docx files with XSLT, focusing on transforming the content rather than creating the zip structure. Sample applications include XSLT for replacing non-Unicode Symbol or Windings characters with their Unicode equivalents from Arial Unicode MS (or from another font, subject to your template modifications).|
|hub2docx||yes (Calabash, Bash)||yes (XSLT, XProc, .docx template; cascading)||Convert Hub XML to OOXML (.docx) (uses docx_modify)|
|xml2idml||yes||yes (XSLT, XProc, CSS, IDML template; cascading)||IDML synthesis (typically via XHTML+CSS preview)|
|epubtools||yes (Calabash, Make)||yes (CSS, heading/splitting configuration; cascading)||EPUB generator (Input: XHTML). Generates EPUB 2 or 3; is able to split the HTML files dynamically according to each unit’s text length; will generate a usable names for files with a query string in their URL; creates EPUBs with Audio-Overlays, …|
|htmltemplates||no||yes (cascading)||HTML templates for assembling e-books from boilerplate text, body content and metadata.
|css-expand||yes (Calabash)||yes (XSLT, XProc, .docx-Vorlage; cascading)||CSS→XML parser and conversion from CSS to CSSa. The standalone tool is an example for checking CSSa properties with Schematron you may supply your own There’s a Schematron.|
|css-generate||yes (Calabash, Bash)||yes (XSLT, XProc, .docx template; cascading)||CSS serialization from an XML representation|
|hub2html||no||yes (XSLT, XProc; cascading)||Convert Hub XML to HTML. This module’s
XSLT will also be used by other modules for filtering and transforming CSSa layout information (for example, creating
|html2hub||yes (Calabash, Make, Bash, .bat)||no (limited through XSLT)||Convert HTML to Hub XML.|
|html-tables||yes (Calabash)||no||This implementation of Andrew Welch’s HTML table normalizing algorithm inserts the cell origin coordinates on the physical grid as data-… attributes.|
|letex-util||no||no||XSLT functions for converting between lengths, colors, and hex numbers, for mapping file extensions to MIME types and for normalizing CALS/OASIS tables (analogous to html-tables)|
|Hierarchize Hub XML|
|evolve-hub||no||yes (XSLT, XProc; cascading)||Use style information (style names, in particular) for up-converting flat Hub XML as it stems from IDML or .docx. Apart from sectional hierarchies, it deals with nesting lists, grouping tables and graphics with their captions, etc.|
|Core XProc libraries|
|pubcoach||no||yes (configuration file, XSLT – import & extend, XProc)||Central XProc/XSLT library that implements the configuration cascade. Apart from that:
utilities such as Schematron or Relax NG validations that store a “
|xproc-util||no||no||Libraries for linear, one-mode-per-pass XSLT conversion pipelines, for saving debugging
after each conversion pass and for associating Schema information with documents
|hub2bits||no||yes (XSLT, XProc; cascading)||Convert Hub XML to JATS/BITS/HoBoTS XML. Uses hub2html’s CSSa wrapper/mapper.|
|hub2tei||no||yes (XSLT, XProc; cascading)||Convert Hub XML to TEI. Uses hub2html’s CSSa wrapper/mapper.|
|jats2html||no||yes (XSLT, XProc; cascading)||Convert JATS/BITS/HoBoTS XML to HTML. Uses hub2html’s CSSa wrapper/mapper.|
|tei2html||no||yes (XSLT, XProc; cascading)||Convert TEI to HTML. Uses hub2html’s CSSa wrapper/mapper.|
|schematron-stdlib||no||yes (cascading)||Typical Schematron checks, for ex. for unanchored text frames in IDML or for styles that are in the document but not in the template. These checks may be imported and overridden by actual transpect projects.|
|htmlreports||no||no||Patch SVRL (and SVRL that is derived from Relax NG validation, by finding the closest @srcpath for a given error location) into an HTML rendering, based on a common @srcpath attribute.|
|idmlval||yes (Calabash)||no||Validation against the IDML Relax NG Schemas, particularly for synthesized IDML.|
|schema||no||no||Schema collection (XHTML, JATS, TEI, …) for external+catalog inclusion.|
|hub2fo||no||no||Convert Hub XML to XSL-FO (based on the contained CSSa)CSSa)|
|hub2tex||no||yes (mapping configuration file)||Convert Hub to LaTeX.|
|fontlib||no||no||Open source fonts that may be included via their canonical URL from CSS. They will then be included by epubtools (thanks to css-expand).|
|font-maps||no||no||Mapping from non-Unicode fonts (Symbol, Wingdings, …) to Unicode, for inclusion from docx2hub, idml2xml, etc.|
|crossref||yes (Calabash, Make, fetchmail/procmail)||yes (XSLT)||Submit CrossRef batch queries for JATS/BITS/HoBoTS XML and generate scripts that augment the original InDesign files with the retreived DOIs.|