MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild
Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang

TL;DR
MixNet is a hybrid CNN-Transformer architecture designed to improve the detection of small, challenging scene text in natural images, outperforming existing methods across multiple datasets.
Contribution
Introduces MixNet, combining FSNet and CTBlock modules, with a novel feature shuffling strategy and center line features, achieving state-of-the-art results in scene text detection.
Findings
Significant improvements over existing text detection methods.
Outperforms popular models like PAN, DB, and FAST.
Achieves state-of-the-art results on multiple datasets.
Abstract
Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting conditions. MixNet incorporates two key modules: (1) the Feature Shuffle Network (FSNet) to serve as the backbone and (2) the Central Transformer Block (CTBlock) to exploit the 1D manifold constraint of the scene text. We first introduce a novel feature shuffling strategy in FSNet to facilitate the exchange of features across multiple scales, generating high-resolution features superior to popular ResNet and HRNet. The FSNet backbone has achieved significant improvements over many existing text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Vehicle License Plate Recognition
Methods(FiLe@Against@Claim)How do I file a claim against Expedia? · Multi-Head Attention · Attention Is All You Need · Average Pooling · Kaiming Initialization · 1x1 Convolution · Batch Normalization · Global Average Pooling · Linear Layer · Grouped Convolution
