A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions
Mirco Ramo, Gu\'enol\'e C.M. Silvestre

TL;DR
This paper introduces a Transformer-based model for online handwritten gesture recognition that accurately constructs mathematical expression trees, demonstrating robustness and high accuracy on a new dataset with potential applications beyond handwriting recognition.
Contribution
It presents a novel Transformer architecture that encodes spatio-temporal gesture data to produce syntactically correct mathematical expression trees, including a new dataset and evaluation metric.
Findings
Achieved 94% normalized Levenshtein accuracy on expression tree predictions.
Successfully trained a small Transformer suitable for edge devices.
Demonstrated robustness to input ablation and unseen glyphs.
Abstract
The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings · Layer Normalization
