Tokenization vs. Augmentation: A Systematic Study of Writer Variance in IMU-Based Online Handwriting Recognition
Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter K\"ampf, Bj\"orn Eskofier

TL;DR
This study systematically compares tokenization and data augmentation strategies in IMU-based online handwriting recognition, revealing their distinct effectiveness in handling inter- and intra-writer variability.
Contribution
It provides a comprehensive analysis of how sub-word tokenization and concatenation-based data augmentation impact recognition performance under different writer variability conditions.
Findings
Bigram tokenization improves writer-independent recognition accuracy.
Data augmentation significantly reduces error rates in writer-dependent scenarios.
Short, low-level tokens enhance model performance.
Abstract
Inertial measurement unit-based online handwriting recognition enables the recognition of input signals collected across different writing surfaces but remains challenged by uneven character distributions and inter-writer variability. In this work, we systematically investigate two strategies to address these issues: sub-word tokenization and concatenation-based data augmentation. Our experiments on the OnHW-Words500 dataset reveal a clear dichotomy between handling inter-writer and intra-writer variance. On the writer-independent split, structural abstraction via Bigram tokenization significantly improves performance to unseen writing styles, reducing the word error rate (WER) from 15.40% to 12.99%. In contrast, on the writer-dependent split, tokenization degrades performance due to vocabulary distribution shifts between the training and validation sets. Instead, our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Interactive and Immersive Displays · Writing and Handwriting Education
