Stroke Constrained Attention Network for Online Handwritten Mathematical   Expression Recognition

Jiaming Wang; Jun Du; Jianshu Zhang

arXiv:2002.08670·cs.CV·February 21, 2020·6 cites

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition

Jiaming Wang, Jun Du, Jianshu Zhang

PDF

Open Access

TL;DR

This paper introduces a stroke constrained attention network (SCAN) that leverages stroke-level information for improved online and offline handwritten mathematical expression recognition, achieving state-of-the-art results.

Contribution

The novel SCAN model uses stroke-level units for better alignment and recognition in HMER, integrating multi-modal data at the encoder stage for enhanced performance.

Findings

01

Achieves state-of-the-art accuracy on CROHME benchmark.

02

Effectively fuses multi-modal information at the encoder level.

03

Reduces symbol segmentation difficulty through stroke grouping.

Abstract

In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition