Towards Unified Scene Text Spotting based on Sequence Generation
Taeho Kil, Seonghyeon Kim, Sukmin Seo, Yoonsik Kim, Daehee Kim

TL;DR
The paper introduces UNITS, a unified scene text spotting model based on sequence generation that detects arbitrary-shaped texts and can extract more texts than trained for, achieving competitive results.
Contribution
It presents a novel unified model that handles various text shapes and enables extraction beyond training limits using starting-point prompting.
Findings
Achieves competitive performance with state-of-the-art methods.
Can extract more text instances than it was trained on.
Handles arbitrary-shaped texts effectively.
Abstract
Sequence generation models have recently made significant progress in unifying various vision tasks. Although some auto-regressive models have demonstrated promising results in end-to-end text spotting, they use specific detection formats while ignoring various text shapes and are limited in the maximum number of text instances that can be detected. To overcome these limitations, we propose a UNIfied scene Text Spotter, called UNITS. Our model unifies various detection formats, including quadrilaterals and polygons, allowing it to detect text in arbitrary shapes. Additionally, we apply starting-point prompting to enable the model to extract texts from an arbitrary starting point, thereby extracting more texts beyond the number of instances it was trained on. Experimental results demonstrate that our method achieves competitive performance compared to state-of-the-art methods. Further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
