Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler, Jonathan Parkes Allen, Matthew Thomas Miller, Taylor, Berg-Kirkpatrick

TL;DR
This paper introduces a self-supervised pre-training method for low-resource historical document transcription, significantly improving recognition accuracy with minimal labeled data by learning robust visual language representations.
Contribution
It proposes a masked language model-style pre-training strategy that enhances recognition performance for handwritten and printed historical documents with limited supervision.
Findings
Improved transcription accuracy with as few as 30 line images.
Effective learning of style-invariant contextualized representations.
Applicable to both handwritten and printed historical documents.
Abstract
We present a self-supervised pre-training approach for learning rich visual language representations for both handwritten and printed historical document transcription. After supervised fine-tuning of our pre-trained encoder representations for low-resource document transcription on two languages, (1) a heterogeneous set of handwritten Islamicate manuscript images and (2) early modern English printed documents, we show a meaningful improvement in recognition accuracy over the same supervised model trained from scratch with as few as 30 line image transcriptions for training. Our masked language model-style pre-training strategy, where the model is trained to be able to identify the true masked visual representation from distractors sampled from within the same line, encourages learning robust contextualized language representations invariant to scribal writing style and printing noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications
