TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition
Jianhua Zhu, Wenqi Zhao, Yu Li, Xingjian Hu, Liangcai Gao

TL;DR
TAMER is a novel Transformer-based model that effectively recognizes handwritten mathematical expressions by jointly modeling sequence and tree structures, leading to improved accuracy and structural correctness.
Contribution
Introduces TAMER, a Tree-aware Transformer that combines sequence and tree decoding for better understanding of mathematical expression structures.
Findings
Outperforms existing models on CROHME datasets
Achieves state-of-the-art accuracy in complex expression recognition
Enhances structural validity of generated LaTeX sequences
Abstract
Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict sequences, struggle to understand and model the inherent tree structure of and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we propose a novel model named TAMER (Tree-Aware Transformer) for handwritten mathematical expression recognition. TAMER introduces an innovative Tree-aware Module while maintaining the flexibility and efficient training of Transformer. TAMER combines the advantages of both sequence decoding and tree decoding models by jointly optimizing sequence prediction and tree structure prediction tasks, which enhances the model's understanding and generalization of complex mathematical expression structures.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax
