You Only Recognize Once: Towards Fast Video Text Spotting

Zhanzhan Cheng; Jing Lu; Yi Niu; Shiliang Pu; Fei Wu; Shuigeng Zhou

arXiv:1903.03299·cs.CV·October 26, 2021·6 cites

You Only Recognize Once: Towards Fast Video Text Spotting

Zhanzhan Cheng, Jing Lu, Yi Niu, Shiliang Pu, Fei Wu, Shuigeng Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fast, robust video text spotting framework that recognizes text only once per video, significantly reducing computational costs and improving accuracy over traditional multi-stage methods.

Contribution

The authors propose a novel end-to-end trainable text recommender that selects high-quality text for recognition, streamlining the process and enhancing speed and robustness.

Findings

01

Speeds up recognition by 71 times compared to frame-wise methods

02

Achieves state-of-the-art accuracy on public benchmarks

03

Introduces a new large-scale video text dataset (LSVTD)

Abstract

Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as the interferences of low-quality text. In this paper, we propose a fast and robust video text spotting framework by only recognizing the localized text one-time instead of frame-wisely recognition. Specifically, we first obtain text regions in videos with a well-designed spatial-temporal detector. Then we concentrate on developing a novel text recommender for selecting the highest-quality text from text streams and only recognizing the selected ones. Here, the recommender assembles text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hikopensource/davar-lab-ocr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Video Analysis and Summarization