Adaptive Shrink-Mask for Text Detection
Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang, Xuelong Li

TL;DR
The paper introduces an adaptive shrink-mask based text detection network that enhances accuracy and robustness while maintaining real-time speed, by weakening the dependence on shrink-masks and utilizing surrounding pixel information during training.
Contribution
It proposes the Adaptive Shrink-Mask (ASM) and Super-pixel Window (SPW) to improve text detection robustness and accuracy, with a lightweight architecture for efficiency.
Findings
Outperforms state-of-the-art methods in accuracy and speed.
Improves detection robustness by weakening coupling to shrink-masks.
Uses surrounding pixel context during training, not testing.
Abstract
Existing real-time text detectors reconstruct text contours by shrink-masks directly, which simplifies the framework and can make the model run fast. However, the strong dependence on predicted shrink-masks leads to unstable detection results. Moreover, the discrimination of shrink-masks is a pixelwise prediction task. Supervising the network by shrink-masks only will lose much semantic context, which leads to the false detection of shrink-masks. To address these problems, we construct an efficient text detection network, Adaptive Shrink-Mask for Text Detection (ASMTD), which improves the accuracy during training and reduces the complexity of the inference process. At first, the Adaptive Shrink-Mask (ASM) is proposed to represent texts by shrink-masks and independent adaptive offsets. It weakens the coupling of texts to shrink-masks, which improves the robustness of detection results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
