MMHMER:Multi-viewer and Multi-task for Handwritten Mathematical Expression Recognition
Kehua Chen, Haoyang Shen, Lifan Zhong, Mingyi Chen

TL;DR
This paper introduces MMHMER, a multi-viewer and multi-task framework combining CNN and Transformer architectures to improve handwritten mathematical expression recognition, achieving state-of-the-art accuracy on CROHME datasets.
Contribution
Proposes a novel multi-view, multi-task framework that effectively integrates CNN and Transformer models for enhanced recognition performance.
Findings
Achieves higher expression recognition rates than previous models.
Effectively fuses CNN and Transformer strengths for complex expression handling.
Outperforms Posformer on CROHME datasets with significant gains.
Abstract
Handwritten Mathematical Expression Recognition (HMER) methods have made remarkable progress, with most existing HMER approaches based on either a hybrid CNN/RNN-based with GRU architecture or Transformer architectures. Each of these has its strengths and weaknesses. Leveraging different model structures as viewers and effectively integrating their diverse capabilities presents an intriguing avenue for exploration. This involves addressing two key challenges: 1) How to fuse these two methods effectively, and 2) How to achieve higher performance under an appropriate level of complexity. This paper proposes an efficient CNN-Transformer multi-viewer, multi-task approach to enhance the model's recognition performance. Our MMHMER model achieves 63.96%, 62.51%, and 65.46% ExpRate on CROHME14, CROHME16, and CROHME19, outperforming Posformer with an absolute gain of 1.28%, 1.48%, and 0.58%. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
