Sequence-to-Sequence Contrastive Learning for Text Recognition

Aviad Aberdam; Ron Litman; Shahar Tsiper; Oron Anschel; Ron Slossberg,; Shai Mazor; R. Manmatha; Pietro Perona

arXiv:2012.10873·cs.CV·December 22, 2020

Sequence-to-Sequence Contrastive Learning for Text Recognition

Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg,, Shai Mazor, R. Manmatha, Pietro Perona

PDF

2 Repos

TL;DR

This paper introduces SeqCLR, a sequence-to-sequence contrastive learning framework for text recognition that improves visual representations and outperforms existing methods, especially with limited supervision.

Contribution

The paper presents a novel sequence-to-sequence contrastive learning approach with sub-word level contrast, new augmentation heuristics, and encoder architectures for text recognition.

Findings

01

Outperforms non-sequential contrastive methods on text recognition tasks.

02

Significantly improves performance with limited supervision.

03

Achieves state-of-the-art results on handwritten text benchmarks.

Abstract

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning