Language Model Supervision for Handwriting Recognition Model Adaptation
Chris Tensmeyer, Curtis Wigington, Brian Davis, Seth Stewart, Tony, Martinez, William Barrett

TL;DR
This paper introduces a transfer learning approach that adapts handwriting recognition models from high-resource to low-resource languages using unlabeled data and language models, reducing the need for extensive labeled datasets.
Contribution
The paper presents a novel transfer learning methodology leveraging language models to adapt HWR models across languages with different data availability levels.
Findings
Improved transferability among French, English, and Spanish handwriting datasets.
Character error rates nearly match fully supervised training in best cases.
Method reduces reliance on large labeled datasets for low-resource languages.
Abstract
Training state-of-the-art offline handwriting recognition (HWR) models requires large labeled datasets, but unfortunately such datasets are not available in all languages and domains due to the high cost of manual labeling.We address this problem by showing how high resource languages can be leveraged to help train models for low resource languages.We propose a transfer learning methodology where we adapt HWR models trained on a source language to a target language that uses the same writing script.This methodology only requires labeled data in the source language, unlabeled data in the target language, and a language model of the target language. The language model is used in a bootstrapping fashion to refine predictions in the target language for use as ground truth in training the model.Using this approach we demonstrate improved transferability among French, English, and Spanish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
