TL;DR
This study analyzes the consistency of TeX-produced documents across different engines and distributions, revealing significant discrepancies and identifying hidden bugs to improve document robustness.
Contribution
It introduces an automated pipeline to evaluate cross-engine and cross-version compatibility of TeX, uncovering hidden bugs and quantifying inconsistencies in the ecosystem.
Findings
Only 0.2% of documents are identical across XeTeX and PDFTeX.
42.1% of documents produce the same output from 2020 to 2023.
Identified new bugs in LaTeX packages and fixed existing bugs independently.
Abstract
TeX is a widely-used typesetting system adopted by most publishers and professional societies. While TeX is responsible for generating a significant number of documents, irregularities in the TeX ecosystem may produce inconsistent documents. These inconsistencies may occur across different TeX engines or different versions of TeX distributions, resulting in failures to adhere to formatting specifications, or the same document rendering differently for different authors. In this work, we investigate and quantify the robustness of the TeX ecosystem through a large-scale study of 432 documents. We developed an automated pipeline to evaluate the cross-engine and cross-version compatibility of the TeX ecosystem. We found significant inconsistencies in the outputs of different TeX engines: only 0.2% of documents compiled to identical output with XeTeX and PDFTeX due to a lack of cross-engine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
