LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement
Nan Jiang, Shanchao Liang, Chengxiao Wang, Jiannan Wang, Lin Tan

TL;DR
LATTE introduces an iterative refinement framework utilizing delta-view feedback to enhance LaTeX source recognition accuracy for complex formulae and tables from PDF images, outperforming existing methods including GPT-4V.
Contribution
The paper presents LATTE, the first iterative LaTeX recognition framework that uses delta-view feedback for fault localization and refinement, improving accuracy for formulae and tables.
Findings
Outperforms existing techniques and GPT-4V by at least 7.03% in exact match accuracy.
Achieves a 46.08% success refinement rate for formulae.
Achieves a 25.51% success refinement rate for tables.
Abstract
Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is not WYSWYG -- what you see is what you get, i.e., the LaTeX source and rendered PDF images look drastically different, especially for formulae and tables. This gap makes it hard to modify or export LaTeX sources for formulae and tables from PDF images, and existing work is still limited. First, prior work generates LaTeX sources in a single iteration and struggles with complex LaTeX formulae. Second, existing work mainly recognizes and extracts LaTeX sources for formulae; and is incapable or ineffective for tables. This paper proposes LATTE, the first iterative refinement framework for LaTeX recognition. Specifically, we propose delta-view as feedback,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing
