IC3: Image Captioning by Committee Consensus

David M. Chan; Austin Myers; Sudheendra Vijayanarasimhan; David A.; Ross; John Canny

arXiv:2302.01328·cs.CV·October 20, 2023

IC3: Image Captioning by Committee Consensus

David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A., Ross, John Canny

PDF

Open Access 1 Repo

TL;DR

IC3 introduces a novel ensemble approach for image captioning that synthesizes multiple viewpoints to produce more comprehensive and informative captions, outperforming existing state-of-the-art models in human and automated evaluations.

Contribution

The paper proposes a new committee consensus method for image captioning that captures diverse scene details, improving caption informativeness and system recall performance.

Findings

01

IC3 captions are rated as helpful as top models by humans over two-thirds of the time.

02

IC3 enhances automated recall system performance by up to 84%.

03

IC3 outperforms single-reference captions and state-of-the-art approaches.

Abstract

If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidmchan/caption-by-committee
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization