Image-to-Markup Generation with Coarse-to-Fine Attention
Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

TL;DR
This paper introduces a neural encoder-decoder with coarse-to-fine attention for converting images of mathematical expressions into LaTeX markup, outperforming classical OCR methods and handling real-world data effectively.
Contribution
The paper proposes a scalable coarse-to-fine attention mechanism for image-to-markup generation, along with a new dataset and an efficient attention layer to reduce inference complexity.
Findings
Outperforms classical mathematical OCR systems on rendered data
Performs well on handwritten data with pretraining
Introduces a new dataset of mathematical expressions with LaTeX markup
Abstract
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
