FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene   Text Spotting

Alloy Das; Sanket Biswas; Umapada Pal; Josep Llad\'os; Saumik; Bhattacharya

arXiv:2408.14998·cs.CV·March 13, 2025

FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

Alloy Das, Sanket Biswas, Umapada Pal, Josep Llad\'os, Saumik, Bhattacharya

PDF

Open Access 1 Repo

TL;DR

FastTextSpotter introduces a high-efficiency transformer-based framework for multilingual scene text spotting, achieving superior accuracy and speed across diverse datasets while maintaining robustness in recognizing complex text shapes.

Contribution

It presents a novel, faster self-attention unit SAC2 integrated into a transformer architecture, significantly improving efficiency without sacrificing accuracy.

Findings

01

Achieves state-of-the-art accuracy on multiple scene text datasets.

02

Demonstrates improved processing speed over existing models.

03

Effectively recognizes multilingual and arbitrarily-shaped texts.

Abstract

The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-attention unit, SAC2, to improve processing speeds while maintaining accuracy. FastTextSpotter has been validated across multiple datasets, including ICDAR2015 for regular texts and CTW1500 and TotalText for arbitrary-shaped texts, benchmarking against current state-of-the-art models. Our results indicate that FastTextSpotter not only achieves superior accuracy in detecting and recognizing multilingual scene text (English and Vietnamese) but also improves model efficiency, thereby setting new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alloydas/fast-textspotter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Music and Audio Processing

MethodsAttention Is All You Need · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Softmax · Linear Layer · Label Smoothing