STELA: A Real-Time Scene Text Detector with Learned Anchor

Linjie Deng; Yanxiang Gong; Xinchen Lu; Yi Lin; Zheng Ma; Mei Xie

arXiv:1909.07549·cs.CV·September 24, 2019·1 cites

STELA: A Real-Time Scene Text Detector with Learned Anchor

Linjie Deng, Yanxiang Gong, Xinchen Lu, Yi Lin, Zheng Ma, Mei Xie

PDF

Open Access 1 Repo

TL;DR

STELA introduces a real-time, single-reference-box scene text detection method that leverages learned anchors, achieving high accuracy and efficiency, and simplifying anchor design compared to traditional multi-anchor approaches.

Contribution

The paper proposes a novel one-stage scene text detector using learned anchors with a single reference box per location, inspired by two-stage R-CNN frameworks.

Findings

01

Achieves 26.5 fps at 800p resolution.

02

Surpasses all existing anchor-based scene text detectors.

03

Demonstrates competitive performance on public benchmarks.

Abstract

To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26:5fps at 800p), which surpasses all of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhzdeng/stela
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsFocal Loss · 1x1 Convolution · Feature Pyramid Network · RetinaNet · Support Vector Machine · Max Pooling · Convolution · R-CNN