See or Guess: Counterfactually Regularized Image Captioning

Qian Cao; Xu Chen; Ruihua Song; Xiting Wang; Xinting Huang; Yuchen Ren

arXiv:2408.16809·cs.CV·September 2, 2024

See or Guess: Counterfactually Regularized Image Captioning

Qian Cao, Xu Chen, Ruihua Song, Xiting Wang, Xinting Huang, Yuchen Ren

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal inference-based framework for image captioning that enhances model robustness and interpretability, especially in counterfactual scenarios, by reducing hallucinations and improving faithfulness.

Contribution

It proposes a novel counterfactual regularization approach with two variants, improving generalizability and interpretability of image captioning models.

Findings

01

Reduces hallucinations in image captioning.

02

Enhances faithfulness to images across datasets.

03

Improves robustness in counterfactual scenarios.

Abstract

Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities of machines with human intelligence through statistical fitting of existing datasets. While effective for normal images, they may struggle to accurately describe those where certain parts of the image are obscured or edited, unlike humans who excel in such cases. These weaknesses they exhibit, including hallucinations and limited interpretability, often hinder performance in scenarios with shifted association patterns. In this paper, we present a generic image captioning framework that employs causal inference to make existing models more capable of interventional tasks, and counterfactually explainable. Our approach includes two variants leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aman-4-real/see-or-guess
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsCausal inference