FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting
Alloy Das, Sanket Biswas, Umapada Pal, Josep Llad\'os, Saumik, Bhattacharya

TL;DR
FastTextSpotter introduces a high-efficiency transformer-based framework for multilingual scene text spotting, achieving superior accuracy and speed across diverse datasets while maintaining robustness in recognizing complex text shapes.
Contribution
It presents a novel, faster self-attention unit SAC2 integrated into a transformer architecture, significantly improving efficiency without sacrificing accuracy.
Findings
Achieves state-of-the-art accuracy on multiple scene text datasets.
Demonstrates improved processing speed over existing models.
Effectively recognizes multilingual and arbitrarily-shaped texts.
Abstract
The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-attention unit, SAC2, to improve processing speeds while maintaining accuracy. FastTextSpotter has been validated across multiple datasets, including ICDAR2015 for regular texts and CTW1500 and TotalText for arbitrary-shaped texts, benchmarking against current state-of-the-art models. Our results indicate that FastTextSpotter not only achieves superior accuracy in detecting and recognizing multilingual scene text (English and Vietnamese) but also improves model efficiency, thereby setting new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Music and Audio Processing
MethodsAttention Is All You Need · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Softmax · Linear Layer · Label Smoothing
