TL;DR
This paper introduces LaPR, a label-aware prompt retrieval framework that explicitly incorporates label cues into prompt selection, significantly improving visual in-context learning performance across various tasks.
Contribution
LaPR is the first framework to explicitly leverage label information in prompt retrieval for VICL, using joint representations and a mixture-of-experts mechanism.
Findings
LaPR improves VICL performance on segmentation, detection, and colorization tasks.
LaPR generalizes well across different feature extractors and scenarios.
Abstract
Visual in-context learning (VICL) enables visual foundation models to handle multiple tasks by steering them with demonstrative prompts. The choice of such prompts largely influences VICL performance, standing out as a key challenge. Prior work has made substantial progress on prompt retrieval and reranking strategies, but mainly focuses on prompt images while overlooking labels. We reveal these approaches sometimes get visually similar but label-inconsistent prompts, which potentially degrade VICL performance. On the other hand, higher label consistency between query and prompts preferably indicates stronger VICL results. Motivated by these findings, we develop a framework named LaPR (Label-aware Prompt Retrieval), which highlights the role of labels in prompt selection. Our framework first designs an image-label joint representation for prompts to incorporate label cues explicitly.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
