COCO-Counterfactuals: Automatically Constructed Counterfactual Examples   for Image-Text Pairs

Tiep Le; Vasudev Lal; Phillip Howard

arXiv:2309.14356·cs.LG·November 1, 2023·5 cites

COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs

Tiep Le, Vasudev Lal, Phillip Howard

PDF

Open Access 1 Video

TL;DR

This paper introduces COCO-Counterfactuals, a scalable method for automatically generating multimodal counterfactual image-text pairs using diffusion models, to evaluate and improve the robustness of vision-language models.

Contribution

The paper presents a novel framework for creating multimodal counterfactuals and introduces the COCO-Counterfactuals dataset, enabling better evaluation and training of vision-language models.

Findings

01

Existing models struggle with counterfactual image-text pairs.

02

COCO-Counterfactuals are validated as high-quality through human evaluation.

03

Augmenting training data with counterfactuals improves out-of-domain generalization.

Abstract

Counterfactual examples have proven to be valuable in the field of natural language processing (NLP) for both evaluating and improving the robustness of language models to spurious correlations in datasets. Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsDiffusion