Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer
Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado

TL;DR
This paper explores the use of vision transformer architectures for converting handwritten or digital math expressions into LaTeX code, showing they outperform traditional CNN-LSTM models in accuracy and BLEU scores.
Contribution
It introduces vision transformer models for LaTeX code generation from math images and demonstrates their superior performance over CNN-RNN baselines.
Findings
Vision transformers outperform CNN-RNN in accuracy and BLEU scores.
Transformers achieve lower Levenshtein distances.
Results suggest further improvements possible through fine-tuning.
Abstract
Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Open Education and E-Learning
MethodsAttention Is All You Need · Softmax · Dense Connections · Linear Layer · Multi-Head Attention · Tanh Activation · Layer Normalization · Residual Connection · Vision Transformer · Sigmoid Activation
