STELA: A Real-Time Scene Text Detector with Learned Anchor
Linjie Deng, Yanxiang Gong, Xinchen Lu, Yi Lin, Zheng Ma, Mei Xie

TL;DR
STELA introduces a real-time, single-reference-box scene text detection method that leverages learned anchors, achieving high accuracy and efficiency, and simplifying anchor design compared to traditional multi-anchor approaches.
Contribution
The paper proposes a novel one-stage scene text detector using learned anchors with a single reference box per location, inspired by two-stage R-CNN frameworks.
Findings
Achieves 26.5 fps at 800p resolution.
Surpasses all existing anchor-based scene text detectors.
Demonstrates competitive performance on public benchmarks.
Abstract
To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26:5fps at 800p), which surpasses all of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsFocal Loss · 1x1 Convolution · Feature Pyramid Network · RetinaNet · Support Vector Machine · Max Pooling · Convolution · R-CNN
