Automated LaTeX Code Generation from Handwritten Math Expressions Using   Vision Transformer

Jayaprakash Sundararaj; Akhil Vyas; Benjamin Gonzalez-Maldonado

arXiv:2412.03853·cs.CV·December 10, 2024

Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer

Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado

PDF

Open Access

TL;DR

This paper explores the use of vision transformer architectures for converting handwritten or digital math expressions into LaTeX code, showing they outperform traditional CNN-LSTM models in accuracy and BLEU scores.

Contribution

It introduces vision transformer models for LaTeX code generation from math images and demonstrates their superior performance over CNN-RNN baselines.

Findings

01

Vision transformers outperform CNN-RNN in accuracy and BLEU scores.

02

Transformers achieve lower Levenshtein distances.

03

Results suggest further improvements possible through fine-tuning.

Abstract

Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Open Education and E-Learning

MethodsAttention Is All You Need · Softmax · Dense Connections · Linear Layer · Multi-Head Attention · Tanh Activation · Layer Normalization · Residual Connection · Vision Transformer · Sigmoid Activation