Self-supervised Pre-training of Text Recognizers
Martin Ki\v{s}\v{s}, Michal Hradi\v{s}

TL;DR
This paper explores self-supervised pre-training methods for document text recognition, demonstrating their effectiveness on historical datasets and highlighting their potential as a foundation for future research in the field.
Contribution
It introduces novel self-supervised pre-training techniques for text recognition and evaluates their performance, providing insights into their benefits and limitations compared to transfer learning.
Findings
Self-supervised pre-training improves recognition performance on target domain data.
Pre-training on target domain data is more effective than transfer learning from related domains.
Joint-embedding approaches with VICReg and NT-Xent are promising but face challenges like model collapse.
Abstract
In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches -- Feature Quantization, VQ-VAE, and Post-Quantized AE. We also investigate joint-embedding approaches with VICReg and NT-Xent objectives, for which we propose an image shifting technique to prevent model collapse where it relies solely on positional encoding while completely ignoring the input image. We perform our experiments on historical handwritten (Bentham) and historical printed datasets mainly to investigate the benefits of the self-supervised pre-training techniques with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsNormalized Temperature-scaled Cross Entropy Loss · Autoencoders · VQ-VAE
