Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition
Jiaming Wang, Jun Du, Jianshu Zhang

TL;DR
This paper introduces a stroke constrained attention network (SCAN) that leverages stroke-level information for improved online and offline handwritten mathematical expression recognition, achieving state-of-the-art results.
Contribution
The novel SCAN model uses stroke-level units for better alignment and recognition in HMER, integrating multi-modal data at the encoder stage for enhanced performance.
Findings
Achieves state-of-the-art accuracy on CROHME benchmark.
Effectively fuses multi-modal information at the encoder level.
Reduces symbol segmentation difficulty through stroke grouping.
Abstract
In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition
