SPTS: Single-Point Text Spotting
Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang,, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai,, Lianwen Jin

TL;DR
This paper introduces a novel scene text spotting method that requires only a single point annotation per text instance, significantly reducing annotation costs while achieving state-of-the-art results using an auto-regressive Transformer model.
Contribution
It presents the first approach to train scene text spotting models with only single-point annotations, simplifying data labeling and maintaining high performance.
Findings
Achieves state-of-the-art results on benchmark datasets.
Performance is robust to the position of point annotations.
Reduces annotation cost significantly compared to bounding box methods.
Abstract
Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Absolute Position Encodings · Byte Pair Encoding
