Handwritten Stenography Recognition and the LION Dataset
Raphaela Heil, Malin Nauwerck

TL;DR
This paper establishes a baseline for handwritten stenography recognition using the novel LION dataset, demonstrating that integrating stenographic theory and synthetic pre-training significantly improves recognition accuracy.
Contribution
It introduces the LION dataset, applies modern text recognition models to stenography, and explores encoding methods that incorporate stenographic knowledge, advancing the field.
Findings
Baseline CER of 29.81% and WER of 55.14%.
Encoding stenographic features reduces CER to 24.5-26%.
Pre-training with synthetic data further improves accuracy.
Abstract
Purpose: In this paper, we establish a baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition. Methods: A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by applying four different encoding methods that transform the target sequence into representations, which approximate selected aspects of the writing system. Results are further improved by integrating a pre-training scheme, based on synthetic data. Results: The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Vehicle License Plate Recognition
MethodsEvolved Sign Momentum
