CIDEr: Consensus-based Image Description Evaluation

Ramakrishna Vedantam; C. Lawrence Zitnick; Devi Parikh

arXiv:1411.5726·cs.CV·June 4, 2015·61 cites

CIDEr: Consensus-based Image Description Evaluation

Ramakrishna Vedantam, C. Lawrence Zitnick, Devi Parikh

PDF

Open Access 5 Repos

TL;DR

This paper introduces CIDEr, a new automated metric for evaluating image descriptions based on human consensus, supported by new datasets and benchmarking of existing methods.

Contribution

It presents a novel consensus-based evaluation paradigm, a new metric CIDEr, and two datasets, advancing image description assessment.

Findings

01

CIDEr outperforms existing metrics in capturing human judgment.

02

New datasets facilitate better benchmarking of image description methods.

03

Evaluation of state-of-the-art approaches demonstrates the effectiveness of CIDEr.

Abstract

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric (CIDEr) that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques