Retrieving Counterfactuals Improves Visual In-Context Learning

Guangzhi Xiong; Sanchit Sinha; Zhenghao He; Aidong Zhang

arXiv:2603.16737·cs.CV·March 18, 2026

Retrieving Counterfactuals Improves Visual In-Context Learning

Guangzhi Xiong, Sanchit Sinha, Zhenghao He, Aidong Zhang

PDF

Open Access

TL;DR

This paper introduces CIRCLES, a framework that enhances vision-language models by retrieving counterfactual examples for better causal reasoning and improved robustness in multimodal tasks.

Contribution

CIRCLES actively constructs demonstration sets using attribute-guided composed image retrieval to incorporate counterfactual examples, advancing beyond passive similarity-based retrieval methods.

Findings

01

CIRCLES outperforms existing retrieval methods across multiple datasets.

02

It shows significant improvements especially on small-scale models.

03

The method retrieves more diverse and causally informative examples.

Abstract

Vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal reasoning tasks, but they often struggle to disentangle fine-grained visual attributes and reason about underlying causal relationships. In-context learning (ICL) offers a promising avenue for VLMs to adapt to new tasks, but its effectiveness critically depends on the selection of demonstration examples. Existing retrieval-augmented approaches typically rely on passive similarity-based retrieval, which tends to select correlated but non-causal examples, amplifying spurious associations and limiting model robustness. We introduce CIRCLES (Composed Image Retrieval for Causal Learning Example Selection), a novel framework that actively constructs demonstration sets by retrieving counterfactual-style examples through targeted, attribute-guided composed image retrieval. By incorporating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)