Exploring Task-Level Optimal Prompts for Visual In-Context Learning
Yan Zhu, Huan Ma, Changqing Zhang

TL;DR
This paper proposes a task-level prompting approach for Visual In-Context Learning that significantly reduces prompt search costs while maintaining near-optimal performance, addressing a key computational challenge in deploying VICL.
Contribution
It introduces the insight that most test samples share the same optimal prompts and develops two efficient task-level prompt search strategies to improve VICL deployment.
Findings
Task-level prompts achieve similar performance to sample-specific prompts.
Proposed strategies significantly reduce prompt search time.
Method reaches near-optimal VICL performance with minimal cost.
Abstract
With the development of Vision Foundation Models (VFMs) in recent years, Visual In-Context Learning (VICL) has become a better choice compared to modifying models in most scenarios. Different from retraining or fine-tuning model, VICL does not require modifications to the model's weights or architecture, and only needs a prompt with demonstrations to teach VFM how to solve tasks. Currently, significant computational cost for finding optimal prompts for every test sample hinders the deployment of VICL, as determining which demonstrations to use for constructing prompts is very costly. In this paper, however, we find a counterintuitive phenomenon that most test samples actually achieve optimal performance under the same prompts, and searching for sample-level prompts only costs more time but results in completely identical prompts. Therefore, we propose task-level prompting to reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
