Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha,, Pietro Perona

TL;DR
This paper introduces TextTranSpotter, a transformer-based text spotting framework capable of training with both fully- and weakly-supervised data, reducing annotation costs while maintaining high performance.
Contribution
It presents the first weakly-supervised text spotting framework using a novel transformer approach and a Hungarian loss, enabling training with only transcription annotations.
Findings
Weakly-supervised training achieves competitive results.
State-of-the-art performance in fully-supervised mode.
Reduces need for detailed localization annotations.
Abstract
Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
