Intrinsic Image Captioning Evaluation
Chao Zeng, Sam Kwong

TL;DR
This paper introduces I2CE, a learning-based metric for image captioning evaluation that captures semantic similarity and intrinsic information, complementing existing metrics.
Contribution
The paper proposes a novel, learning-based evaluation metric called I2CE, inspired by auto-encoder mechanisms and word embeddings, for more comprehensive caption assessment.
Findings
I2CE maintains robust performance across models.
It provides flexible scores for semantically similar captions.
It complements existing evaluation metrics.
Abstract
The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while evaluating results of captioning models.In this paper we first conduct a comprehensive investigation on contemporary metrics. Motivated by the auto-encoder mechanism and the research advances of word embeddings we propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE). We select several state-of-the-art image captioning models and test their performances on MS COCO dataset with respects to both contemporary metrics and the proposed I2CE. Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
