Intrinsic Image Captioning Evaluation

Chao Zeng; Sam Kwong

arXiv:2012.07333·cs.CV·December 15, 2020·1 cites

Intrinsic Image Captioning Evaluation

Chao Zeng, Sam Kwong

PDF

Open Access

TL;DR

This paper introduces I2CE, a learning-based metric for image captioning evaluation that captures semantic similarity and intrinsic information, complementing existing metrics.

Contribution

The paper proposes a novel, learning-based evaluation metric called I2CE, inspired by auto-encoder mechanisms and word embeddings, for more comprehensive caption assessment.

Findings

01

I2CE maintains robust performance across models.

02

It provides flexible scores for semantically similar captions.

03

It complements existing evaluation metrics.

Abstract

The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while evaluating results of captioning models.In this paper we first conduct a comprehensive investigation on contemporary metrics. Motivated by the auto-encoder mechanism and the research advances of word embeddings we propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE). We select several state-of-the-art image captioning models and test their performances on MS COCO dataset with respects to both contemporary metrics and the proposed I2CE. Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques