Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text   Recognition and Document Enhancement

Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten,; Alicia Forn\'es; Yousri Kessentini; Josep Llad\'os; Lluis Gomez; Dimosthenis; Karatzas

arXiv:2203.04814·cs.CV·August 19, 2022

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten,, Alicia Forn\'es, Yousri Kessentini, Josep Llad\'os, Lluis Gomez, Dimosthenis, Karatzas

PDF

Open Access 1 Repo

TL;DR

Text-DIAE is a self-supervised autoencoder that improves text recognition and document enhancement by learning degradation invariance through transformer-based pretext tasks, outperforming existing methods with less data.

Contribution

Introduces a novel self-supervised transformer autoencoder with tailored pretext tasks for degradation-invariant text recognition and document enhancement.

Findings

01

Outperforms state-of-the-art in text recognition and document enhancement.

02

Requires fewer data samples to achieve convergence.

03

Does not rely on contrastive losses, avoiding their limitations.

Abstract

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dali92002/SSL-OCR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Speech Recognition and Synthesis · Human Pose and Action Recognition