Towards Unified Scene Text Spotting based on Sequence Generation

Taeho Kil; Seonghyeon Kim; Sukmin Seo; Yoonsik Kim; Daehee Kim

arXiv:2304.03435·cs.CV·April 10, 2023·1 cites

Towards Unified Scene Text Spotting based on Sequence Generation

Taeho Kil, Seonghyeon Kim, Sukmin Seo, Yoonsik Kim, Daehee Kim

PDF

Open Access 1 Repo

TL;DR

The paper introduces UNITS, a unified scene text spotting model based on sequence generation that detects arbitrary-shaped texts and can extract more texts than trained for, achieving competitive results.

Contribution

It presents a novel unified model that handles various text shapes and enables extraction beyond training limits using starting-point prompting.

Findings

01

Achieves competitive performance with state-of-the-art methods.

02

Can extract more text instances than it was trained on.

03

Handles arbitrary-shaped texts effectively.

Abstract

Sequence generation models have recently made significant progress in unifying various vision tasks. Although some auto-regressive models have demonstrated promising results in end-to-end text spotting, they use specific detection formats while ignoring various text shapes and are limited in the maximum number of text instances that can be detected. To overcome these limitations, we propose a UNIfied scene Text Spotter, called UNITS. Our model unifies various detection formats, including quadrilaterals and polygons, allowing it to detect text in arbitrary shapes. Additionally, we apply starting-point prompting to enable the model to extract texts from an arbitrary starting point, thereby extracting more texts beyond the number of instances it was trained on. Experimental results demonstrate that our method achieves competitive performance compared to state-of-the-art methods. Further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clovaai/units
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling