Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Yair Kittenplon; Inbal Lavi; Sharon Fogel; Yarin Bar; R. Manmatha,; Pietro Perona

arXiv:2202.05508·cs.CV·February 15, 2022·1 cites

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha,, Pietro Perona

PDF

Open Access

TL;DR

This paper introduces TextTranSpotter, a transformer-based text spotting framework capable of training with both fully- and weakly-supervised data, reducing annotation costs while maintaining high performance.

Contribution

It presents the first weakly-supervised text spotting framework using a novel transformer approach and a Hungarian loss, enabling training with only transcription annotations.

Findings

01

Weakly-supervised training achieves competitive results.

02

State-of-the-art performance in fully-supervised mode.

03

Reduces need for detailed localization annotations.

Abstract

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling