SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

TL;DR
SPICE is a new automatic image caption evaluation metric that focuses on semantic propositional content using scene graphs, showing higher correlation with human judgments than existing metrics.
Contribution
The paper introduces SPICE, a semantic-based evaluation metric for image captions that improves correlation with human judgments over traditional n-gram based metrics.
Findings
SPICE achieves a system-level correlation of 0.88 with human judgments.
SPICE outperforms CIDEr and METEOR in capturing semantic content.
It can evaluate specific captioning skills like color understanding and counting.
Abstract
There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as `which caption-generator best understands colors?' and `can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
