Zoom Text Detector
Chuang. Yang, Mulin. Chen, Yuan. Yuan, and Qi. Wang

TL;DR
The paper introduces Zoom Text Detector (ZTD), a novel method that improves text detection accuracy by addressing shrink-mask reliability issues through zoom-inspired modules and a visual discriminator.
Contribution
It proposes ZTD with Zoom Out and Zoom In modules for better feature extraction and margin recognition, and a Sequential-Visual Discriminator to reduce false positives.
Findings
ZTD achieves superior detection accuracy and robustness.
The modules effectively address shrink-mask ambiguities.
Experimental results demonstrate improved comprehensive performance.
Abstract
To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask based text representation strategies, which leads to a high dependency of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
