Shift Variance in Scene Text Detection
Markus Glitzner, Jan-Hendrik Neudeck, Philipp H\"artinger

TL;DR
This paper investigates the shift variance problem in scene text detection, demonstrating how architectural modifications and smoothing filters can improve shift consistency, and proposes a new metric to quantify this variability.
Contribution
It reveals the inherent shift variance in state-of-the-art text detectors and introduces architectural adjustments and a new metric to enhance and measure shift equivariance.
Findings
Small architectural changes improve shift equivariance.
Adding smoothing filters significantly enhances shift consistency.
Proposed metric effectively quantifies shift variability in text detectors.
Abstract
Theory of convolutional neural networks suggests the property of shift equivariance, i.e., that a shifted input causes an equally shifted output. In practice, however, this is not always the case. This poses a great problem for scene text detection for which a consistent spatial response is crucial, irrespective of the position of the text in the scene. Using a simple synthetic experiment, we demonstrate the inherent shift variance of a state-of-the-art fully convolutional text detector. Furthermore, using the same experimental setting, we show how small architectural changes can lead to an improved shift equivariance and less variation of the detector output. We validate the synthetic results using a real-world training schedule on the text detection network. To quantify the amount of shift variability, we propose a metric based on well-established text detection benchmarks. While…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
