Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards   Enhancing Text Spotting Performance

Alloy Das; Sanket Biswas; Ayan Banerjee; Josep Llad\'os; Umapada Pal,; and Saumik Bhattacharya

arXiv:2310.00917·cs.CV·November 2, 2023

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Llad\'os, Umapada Pal,, and Saumik Bhattacharya

PDF

Open Access 1 Repo

TL;DR

This paper explores multi-lingual datasets and intermediate feature representations to improve domain adaptation in scene text spotting, demonstrating significant accuracy and efficiency gains across diverse benchmarks.

Contribution

It introduces a domain-adaptive training approach using multi-domain data and evaluates a transformer-based model, Swin-TESTR, for improved scene text spotting.

Findings

01

Intermediate representations enhance performance across domains.

02

Multi-lingual and multi-domain training improves adaptability.

03

Swin-TESTR achieves state-of-the-art results in accuracy and efficiency.

Abstract

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alloydas/testr_eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsFocus