Image-to-Markup Generation with Coarse-to-Fine Attention

Yuntian Deng; Anssi Kanervisto; Jeffrey Ling; Alexander M. Rush

arXiv:1609.04938·cs.CV·June 15, 2017·86 cites

Image-to-Markup Generation with Coarse-to-Fine Attention

Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

PDF

Open Access 5 Repos 5 Datasets

TL;DR

This paper introduces a neural encoder-decoder with coarse-to-fine attention for converting images of mathematical expressions into LaTeX markup, outperforming classical OCR methods and handling real-world data effectively.

Contribution

The paper proposes a scalable coarse-to-fine attention mechanism for image-to-markup generation, along with a new dataset and an efficient attention layer to reduce inference complexity.

Findings

01

Outperforms classical mathematical OCR systems on rendered data

02

Performs well on handwritten data with pretraining

03

Introduces a new dataset of mathematical expressions with LaTeX markup

Abstract

We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Natural Language Processing Techniques