Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Yu Li; Jin Jiang; Jianhua Zhu; Shuai Peng; Baole Wei; Yuxuan Zhou; Liangcai Gao

arXiv:2505.23566·cs.CV·October 28, 2025

Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Yu Li, Jin Jiang, Jianhua Zhu, Shuai Peng, Baole Wei, Yuxuan Zhou, Liangcai Gao

PDF

Open Access 1 Repo 1 Datasets

TL;DR

Uni-MuMER leverages a pretrained vision-language model to unify multiple tasks for handwritten mathematical expression recognition, achieving state-of-the-art results without architectural modifications.

Contribution

It introduces a fully fine-tuned VLM framework for HMER that integrates three data-driven tasks, enhancing performance and generalization.

Findings

01

Outperforms existing models by 16-24% on CROHME and HME100K datasets.

02

Achieves state-of-the-art results in zero-shot settings.

03

Demonstrates the effectiveness of multi-task fine-tuning for HMER.

Abstract

Handwritten Mathematical Expression Recognition (HMER) remains a persistent challenge in Optical Character Recognition (OCR) due to the inherent freedom of symbol layouts and variability in handwriting styles. Prior methods have faced performance bottlenecks by proposing isolated architectural modifications, making them difficult to integrate coherently into a unified framework. Meanwhile, recent advances in pretrained vision-language models (VLMs) have demonstrated strong cross-task generalization, offering a promising foundation for developing unified solutions. In this paper, we introduce Uni-MuMER, which fully fine-tunes a VLM for the HMER task without modifying its architecture, effectively injecting domain-specific knowledge into a generalist framework. Our method integrates three data-driven tasks: Tree-Aware Chain-of-Thought (Tree-CoT) for structured spatial reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bflameswift/uni-mumer
pytorchOfficial

Datasets

phxember/Uni-MuMER-Data
dataset· 225 dl
225 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction