Retrieving Counterfactuals Improves Visual In-Context Learning
Guangzhi Xiong, Sanchit Sinha, Zhenghao He, Aidong Zhang

TL;DR
This paper introduces CIRCLES, a framework that enhances vision-language models by retrieving counterfactual examples for better causal reasoning and improved robustness in multimodal tasks.
Contribution
CIRCLES actively constructs demonstration sets using attribute-guided composed image retrieval to incorporate counterfactual examples, advancing beyond passive similarity-based retrieval methods.
Findings
CIRCLES outperforms existing retrieval methods across multiple datasets.
It shows significant improvements especially on small-scale models.
The method retrieves more diverse and causally informative examples.
Abstract
Vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal reasoning tasks, but they often struggle to disentangle fine-grained visual attributes and reason about underlying causal relationships. In-context learning (ICL) offers a promising avenue for VLMs to adapt to new tasks, but its effectiveness critically depends on the selection of demonstration examples. Existing retrieval-augmented approaches typically rely on passive similarity-based retrieval, which tends to select correlated but non-causal examples, amplifying spurious associations and limiting model robustness. We introduce CIRCLES (Composed Image Retrieval for Causal Learning Example Selection), a novel framework that actively constructs demonstration sets by retrieving counterfactual-style examples through targeted, attribute-guided composed image retrieval. By incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
