TextMamba: Scene Text Detector with Mamba
Qiyan Zhao, Yue Yan, Da-Han Wang

TL;DR
TextMamba introduces a scene text detection method that leverages the Mamba state space model with attention layers, improving long-range dependency modeling and achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper proposes integrating Mamba, a state space model, with attention layers for scene text detection, enhancing long-range dependency modeling and multi-scale feature fusion.
Findings
Achieves 89.7% F-measure on CTW1500
Achieves 89.2% F-measure on TotalText
Achieves 78.5% F-measure on ICDAR19ArT
Abstract
In scene text detection, Transformer-based methods have addressed the global feature extraction limitations inherent in traditional convolution neural network-based methods. However, most directly rely on native Transformer attention layers as encoders without evaluating their cross-domain limitations and inherent shortcomings: forgetting important information or focusing on irrelevant representations when modeling long-range dependencies for text detection. The recently proposed state space model Mamba has demonstrated better long-range dependencies modeling through a linear complexity selection mechanism. Therefore, we propose a novel scene text detector based on Mamba that integrates the selection mechanism with attention layers, enhancing the encoder's ability to extract relevant information from long sequences. We adopt the Top\_k algorithm to explicitly select key information and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Advanced Neural Network Applications
