SPTS v2: Single-Point Scene Text Spotting

Yuliang Liu; Jiaxin Zhang; Dezhi Peng; Mingxin Huang; Xinyu Wang,; Jingqun Tang; Can Huang; Dahua Lin; Chunhua Shen; Xiang Bai; Lianwen Jin

arXiv:2301.01635·cs.CV·September 6, 2023

SPTS v2: Single-Point Scene Text Spotting

Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang,, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

PDF

Open Access 3 Repos

TL;DR

SPTS v2 introduces a novel scene text spotting framework that uses single-point annotations, combining auto-regressive and parallel decoders to achieve high accuracy and speed with fewer parameters, surpassing previous methods.

Contribution

The paper presents SPTS v2, a new framework for scene text spotting that trains with single-point annotations and integrates two decoders for detection and recognition, reducing annotation costs and improving efficiency.

Findings

01

Outperforms previous single-point text spotters on benchmarks.

02

Achieves 19× faster inference speed.

03

Uses fewer parameters than existing models.

Abstract

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Dropout