TL;DR
This paper introduces a progressive few-shot learning method for low-resource handwritten text recognition, reducing annotation effort by using synthetic pretraining and unsupervised pseudo-labeling, achieving competitive results.
Contribution
It presents a novel progressive learning approach combining synthetic pretraining and unsupervised pseudo-labeling for efficient low-resource handwritten text recognition.
Findings
Achieves competitive recognition accuracy with minimal human annotation.
Reduces annotation effort significantly through pseudo-labeling.
Effective across different manuscript datasets.
Abstract
Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. The main difficulty comes from the very few annotated data and the limited linguistic information (e.g. dictionaries and language models). Thus, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human labor annotation process, requiring only few images of each alphabet symbol. The method consists in detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from any alphabet, even though different from the target domain. A second training step is then applied to diminish the gap between the source and target data. Since this retraining would require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
