Zoom Text Detector

Chuang. Yang; Mulin. Chen; Yuan. Yuan; and Qi. Wang

arXiv:2209.03014·cs.CV·September 8, 2022

Zoom Text Detector

Chuang. Yang, Mulin. Chen, Yuan. Yuan, and Qi. Wang

PDF

Open Access

TL;DR

The paper introduces Zoom Text Detector (ZTD), a novel method that improves text detection accuracy by addressing shrink-mask reliability issues through zoom-inspired modules and a visual discriminator.

Contribution

It proposes ZTD with Zoom Out and Zoom In modules for better feature extraction and margin recognition, and a Sequential-Visual Discriminator to reduce false positives.

Findings

01

ZTD achieves superior detection accuracy and robustness.

02

The modules effectively address shrink-mask ambiguities.

03

Experimental results demonstrate improved comprehensive performance.

Abstract

To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask based text representation strategies, which leads to a high dependency of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings