AutoV: Loss-Oriented Ranking for Visual Prompt Retrieval in LVLMs
Yuan Zhang, Chun-Kai Fan, Sicheng Yu, Junwen Pan, Tao Huang, Ming Lu, Kuan Cheng, Qi She, Shanghang Zhang

TL;DR
AutoV introduces a loss-based prompt retrieval framework that automatically identifies the most suitable visual prompts for large vision-language models, significantly improving their performance across multiple tasks without manual prompt annotation.
Contribution
It proposes a novel loss-oriented ranking method for automatic visual prompt retrieval, addressing the limitations of prompt engineering in LVLMs.
Findings
AutoV improves LLaVA-OV performance by 10.2% on VizWiz.
AutoV boosts Qwen2.5-VL accuracy by 3.8% on MMMU.
The framework enhances various LVLM tasks including image understanding, captioning, grounding, and classification.
Abstract
Inspired by text prompts in large language models, visual prompts have been explored to enhance the perceptual capabilities of large vision-language models (LVLMs). However, performance tends to saturate under single visual prompt designs, making further prompt engineering increasingly ineffective. To address this limitation, we shift from prompt engineering to prompt retrieval and propose AutoV, a lightweight framework for instance-adaptive visual prompt identification. Given an input image and a textual query, AutoV automatically locates the most suitable visual prompt from a diverse candidate pool. Training such a retrieval framework requires prompt-level supervision, yet prompt quality is inherently ambiguous and difficult to assess reliably, even for humans. To enable automatic supervision, we evaluate visual prompts using a pre-trained LVLM and label them according to their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
