Visual Textualization for Image Prompted Object Detection

Yongjian Wu; Yang Zhou; Jiya Saiyin; Bingzheng Wei; Yan Xu

arXiv:2506.23785·cs.CV·July 1, 2025

Visual Textualization for Image Prompted Object Detection

Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Yan Xu

PDF

Open Access

TL;DR

VisTex-OVLM introduces visual textualization, projecting visual exemplars into text space to improve object detection of rare categories in vision-language models, especially in few-shot scenarios.

Contribution

It presents a novel visual textualization technique that enhances OVLMs' ability to detect rare objects without altering their architecture.

Findings

01

Outperforms previous methods on open-set datasets.

02

Achieves state-of-the-art results on PASCAL VOC and MSCOCO.

03

Effective in few-shot object detection tasks.

Abstract

We propose VisTex-OVLM, a novel image prompted object detection method that introduces visual textualization -- a process that projects a few visual exemplars into the text feature space to enhance Object-level Vision-Language Models' (OVLMs) capability in detecting rare categories that are difficult to describe textually and nearly absent from their pre-training data, while preserving their pre-trained object-text alignment. Specifically, VisTex-OVLM leverages multi-scale textualizing blocks and a multi-stage fusion strategy to integrate visual information from visual exemplars, generating textualized visual tokens that effectively guide OVLMs alongside text prompts. Unlike previous methods, our method maintains the original architecture of OVLM, maintaining its generalization capabilities while enhancing performance in few-shot settings. VisTex-OVLM demonstrates superior performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning