Mask & Match: Learning to Recognize Handwritten Math with Self-Supervised Attention
Shree Mitra, Ritabrata Chakraborty, Nilkanta Sahu

TL;DR
This paper introduces a self-supervised learning framework for handwritten mathematical expression recognition that leverages a novel attention mechanism and progressive masking to improve structural understanding without requiring labeled data.
Contribution
It proposes a self-supervised approach with a new attention network and progressive masking strategy for recognizing handwritten math expressions, reducing dependence on labeled datasets.
Findings
Outperforms existing SSL and supervised methods on CROHME benchmarks.
Effective self-supervised pretraining enhances recognition accuracy.
Progressive masking improves the model's robustness to occlusions.
Abstract
Recognizing handwritten mathematical expressions (HMER) is a challenging task due to the inherent two-dimensional structure, varying symbol scales, and complex spatial relationships among symbols. In this paper, we present a self-supervised learning (SSL) framework for HMER that eliminates the need for expensive labeled data. Our approach begins by pretraining an image encoder using a combination of global and local contrastive loss, enabling the model to learn both holistic and fine-grained representations. A key contribution of this work is a novel self-supervised attention network, which is trained using a progressive spatial masking strategy. This attention mechanism is designed to learn semantically meaningful focus regions, such as operators, exponents, and nested mathematical notation, without requiring any supervision. The progressive masking curriculum encourages the network to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Hand Gesture Recognition Systems · Image Processing and 3D Reconstruction
