Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten,, Alicia Forn\'es, Yousri Kessentini, Josep Llad\'os, Lluis Gomez, Dimosthenis, Karatzas

TL;DR
Text-DIAE is a self-supervised autoencoder that improves text recognition and document enhancement by learning degradation invariance through transformer-based pretext tasks, outperforming existing methods with less data.
Contribution
Introduces a novel self-supervised transformer autoencoder with tailored pretext tasks for degradation-invariant text recognition and document enhancement.
Findings
Outperforms state-of-the-art in text recognition and document enhancement.
Requires fewer data samples to achieve convergence.
Does not rely on contrastive losses, avoiding their limitations.
Abstract
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Speech Recognition and Synthesis · Human Pose and Action Recognition
