What Makes for Good Image Captions?

Delong Chen; Samuel Cahyawijaya; Etsuko Ishii; Ho Shu Chan; Yejin Bang; Pascale Fung

arXiv:2405.00485·cs.CV·August 21, 2025·1 cites

What Makes for Good Image Captions?

Delong Chen, Samuel Cahyawijaya, Etsuko Ishii, Ho Shu Chan, Yejin Bang, Pascale Fung

PDF

Open Access 1 Video

TL;DR

This paper introduces an information-theoretic framework for image captioning emphasizing balanced, informative, and human-understandable captions, and proposes the Pyramid of Captions method to enhance caption quality.

Contribution

It formalizes image captioning as an information-theoretic problem and introduces PoCa, a novel method that improves caption richness by integrating visual information.

Findings

01

PoCa enhances caption quality in experiments

02

Framework effectively balances informativeness and redundancy

03

Theoretical proof supports PoCa's effectiveness

Abstract

This paper establishes a formal information-theoretic framework for image captioning, conceptualizing captions as compressed linguistic representations that selectively encode semantic units in images. Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans. By formulating these aspects as quantitative measures with adjustable weights, our framework provides a flexible foundation for analyzing and optimizing image captioning systems across diverse task requirements. To demonstrate its applicability, we introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information. We present both theoretical proof that PoCa improves caption quality under certain assumptions, and empirical validation of its effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What Makes for Good Image Captions?· underline

Taxonomy

TopicsSubtitles and Audiovisual Media · Media, Gender, and Advertising · Translation Studies and Practices