From Descriptive Richness to Bias: Unveiling the Dark Side of Generative   Image Caption Enrichment

Yusuke Hirota; Ryo Hachiuma; Chao-Han Huck Yang; Yuta; Nakashima

arXiv:2406.13912·cs.CV·June 21, 2024

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Yusuke Hirota, Ryo Hachiuma, Chao-Han Huck Yang, Yuta, Nakashima

PDF

Open Access 1 Video

TL;DR

This paper investigates the negative side effects of generative caption enrichment in vision-language models, revealing increased gender bias and hallucination, and warns against overly descriptive captioning practices.

Contribution

It uncovers the unintended amplification of bias and hallucination caused by generative caption enrichment, highlighting potential risks in current captioning methods.

Findings

01

Enriched captions exhibit higher gender bias and hallucination.

02

Models trained on enriched captions amplify bias by 30.9%.

03

Hallucination increases by 59.5% with enriched captions.

Abstract

Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of "gender bias" and "hallucination", showing that enriched captions suffer from increased gender bias and hallucination. Furthermore, models trained on these enriched captions amplify gender bias by an average of 30.9% and increase hallucination by 59.5%. This study serves as a caution against the trend of making captions more descriptive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media

MethodsFocus