Adaptive Shrink-Mask for Text Detection

Chuang Yang; Mulin Chen; Yuan Yuan; Qi Wang; Xuelong Li

arXiv:2111.09560·cs.CV·November 19, 2021

Adaptive Shrink-Mask for Text Detection

Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang, Xuelong Li

PDF

Open Access

TL;DR

The paper introduces an adaptive shrink-mask based text detection network that enhances accuracy and robustness while maintaining real-time speed, by weakening the dependence on shrink-masks and utilizing surrounding pixel information during training.

Contribution

It proposes the Adaptive Shrink-Mask (ASM) and Super-pixel Window (SPW) to improve text detection robustness and accuracy, with a lightweight architecture for efficiency.

Findings

01

Outperforms state-of-the-art methods in accuracy and speed.

02

Improves detection robustness by weakening coupling to shrink-masks.

03

Uses surrounding pixel context during training, not testing.

Abstract

Existing real-time text detectors reconstruct text contours by shrink-masks directly, which simplifies the framework and can make the model run fast. However, the strong dependence on predicted shrink-masks leads to unstable detection results. Moreover, the discrimination of shrink-masks is a pixelwise prediction task. Supervising the network by shrink-masks only will lose much semantic context, which leads to the false detection of shrink-masks. To address these problems, we construct an efficient text detection network, Adaptive Shrink-Mask for Text Detection (ASMTD), which improves the accuracy during training and reduces the complexity of the inference process. At first, the Adaptive Shrink-Mask (ASM) is proposed to represent texts by shrink-masks and independent adaptive offsets. It weakens the coupling of texts to shrink-masks, which improves the robustness of detection results.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings