TiCLS : Tightly Coupled Language Text Spotter

Leeje Jang; Yijun Lin; Yao-Yi Chiang; Jerod Weinman

arXiv:2602.04030·cs.CV·February 5, 2026

TiCLS : Tightly Coupled Language Text Spotter

Leeje Jang, Yijun Lin, Yao-Yi Chiang, Jerod Weinman

PDF

Open Access

TL;DR

TiCLS is an end-to-end scene text spotting method that integrates external character-level language models to improve recognition of ambiguous or fragmented text in images.

Contribution

It introduces a linguistic decoder that fuses visual and linguistic features, leveraging pretrained language models for enhanced scene text recognition.

Findings

01

Achieves state-of-the-art results on ICDAR 2015 and Total-Text datasets.

02

Demonstrates the effectiveness of external linguistic knowledge in scene text spotting.

03

Improves recognition robustness for ambiguous or fragmented text instances.

Abstract

Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis