TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Yao Xiao; Qiqian Fu; Heyi Tao; Yuqun Wu; Zhen Zhu; Derek Hoiem

arXiv:2505.23769·cs.CV·November 7, 2025

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

Yao Xiao, Qiqian Fu, Heyi Tao, Yuqun Wu, Zhen Zhu, Derek Hoiem

PDF

1 Repo

TL;DR

TextRegion leverages frozen image-text models and segmentation techniques to produce detailed, text-aligned region tokens, enhancing open-vocabulary visual understanding without additional training.

Contribution

We introduce TextRegion, a training-free framework combining image-text models and SAM2 for detailed, open-vocabulary region tokens applicable to various visual tasks.

Findings

01

Achieves superior or competitive performance on multiple tasks.

02

Compatible with various image-text models.

03

Effective without additional training.

Abstract

Image-text models excel at image-level tasks but struggle with detailed visual understanding. While these models provide strong visual-language alignment, segmentation models like SAM2 offer precise spatial boundaries for objects. To this end, we propose TextRegion, a simple, effective, and training-free framework that combines the strengths of image-text models and SAM2 to generate powerful text-aligned region tokens. These tokens enable detailed visual understanding while preserving open-vocabulary capabilities. They can be directly applied to various downstream tasks, including open-world semantic segmentation, referring expression comprehension, and grounding. We conduct extensive evaluations and consistently achieve superior or competitive performance compared to state-of-the-art training-free methods. Additionally, our framework is compatible with many image-text models, making it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avaxiao/TextRegion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.