Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Yan Zhu; Huan Ma; Changqing Zhang

arXiv:2501.08841·cs.AI·January 16, 2025

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Yan Zhu, Huan Ma, Changqing Zhang

PDF

Open Access

TL;DR

This paper proposes a task-level prompting approach for Visual In-Context Learning that significantly reduces prompt search costs while maintaining near-optimal performance, addressing a key computational challenge in deploying VICL.

Contribution

It introduces the insight that most test samples share the same optimal prompts and develops two efficient task-level prompt search strategies to improve VICL deployment.

Findings

01

Task-level prompts achieve similar performance to sample-specific prompts.

02

Proposed strategies significantly reduce prompt search time.

03

Method reaches near-optimal VICL performance with minimal cost.

Abstract

With the development of Vision Foundation Models (VFMs) in recent years, Visual In-Context Learning (VICL) has become a better choice compared to modifying models in most scenarios. Different from retraining or fine-tuning model, VICL does not require modifications to the model's weights or architecture, and only needs a prompt with demonstrations to teach VFM how to solve tasks. Currently, significant computational cost for finding optimal prompts for every test sample hinders the deployment of VICL, as determining which demonstrations to use for constructing prompts is very costly. In this paper, however, we find a counterintuitive phenomenon that most test samples actually achieve optimal performance under the same prompts, and searching for sample-level prompts only costs more time but results in completely identical prompts. Therefore, we propose task-level prompting to reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology