Towards Deconfounded Image-Text Matching with Causal Inference

Wenhui Li; Xinqi Su; Dan Song; Lanjun Wang; Kun Zhang; An-An Liu

arXiv:2408.12292·cs.CV·August 23, 2024

Towards Deconfounded Image-Text Matching with Causal Inference

Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu

PDF

TL;DR

This paper introduces a causal inference approach to improve image-text matching by removing dataset bias and spurious correlations, leading to better generalization on benchmark datasets.

Contribution

It proposes a novel Deconfounded Causal Inference Network (DCIN) that uses Structural Causal Models and backdoor adjustment to mitigate intra- and inter-modal biases in image-text matching.

Findings

01

DCIN outperforms existing methods on Flickr30K and MSCOCO datasets.

02

The approach effectively reduces dataset bias and improves matching accuracy.

03

Experimental results demonstrate superior generalization capabilities.

Abstract

Prior image-text matching methods have shown remarkable performance on many benchmark datasets, but most of them overlook the bias in the dataset, which exists in intra-modal and inter-modal, and tend to learn the spurious correlations that extremely degrade the generalization ability of the model. Furthermore, these methods often incorporate biased external knowledge from large-scale datasets as prior knowledge into image-text matching model, which is inevitable to force model further learn biased associations. To address above limitations, this paper firstly utilizes Structural Causal Models (SCMs) to illustrate how intra- and inter-modal confounders damage the image-text matching. Then, we employ backdoor adjustment to propose an innovative Deconfounded Causal Inference Network (DCIN) for image-text matching task. DCIN (1) decomposes the intra- and inter-modal confounders and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCausal inference