Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt
Xingtao Lin, Heqian Qiu, Lanxiao Wang, Ruihang Wang, Linfeng Xu,, Hongliang Li

TL;DR
This paper introduces Region Prompt Tuning (RPT), a novel method that enhances scene text detection by focusing on fine-grained features through region-specific prompts, improving detection accuracy on standard benchmarks.
Contribution
The paper proposes a region prompt tuning approach that decomposes prompts into characters and aligns them with visual tokens, enabling fine-grained feature focus in scene text detection.
Findings
RPT achieves state-of-the-art results on ICDAR2015, TotalText, and CTW1500 datasets.
The method effectively balances global and local features for improved detection.
Character-token alignment enhances fine-grained scene text detection performance.
Abstract
Recent advancements in prompt tuning have successfully adapted large-scale models like Contrastive Language-Image Pre-trained (CLIP) for downstream tasks such as scene text detection. Typically, text prompt complements the text encoder's input, focusing on global features while neglecting fine-grained details, leading to fine-grained text being ignored in task of scene text detection. In this paper, we propose the region prompt tuning (RPT) method for fine-grained scene text detection, where region text prompt proposed would help focus on fine-grained features. Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens, creating a one-to-one correspondence between characters and tokens. This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization · Image Retrieval and Classification Techniques
MethodsFocus · ALIGN
