WEmbSim: A Simple yet Effective Metric for Image Captioning

Naeha Sharif; Lyndon White; Mohammed Bennamoun; Wei Liu; Syed Afaq Ali; Shah

arXiv:2012.13137·cs.CV·December 25, 2020

WEmbSim: A Simple yet Effective Metric for Image Captioning

Naeha Sharif, Lyndon White, Mohammed Bennamoun, Wei Liu, Syed Afaq Ali, Shah

PDF

TL;DR

WEmbSim is a simple cosine similarity-based metric using mean word embeddings that outperforms complex methods in unsupervised image caption evaluation, correlating well with human judgments.

Contribution

The paper introduces WEmbSim, a straightforward yet effective metric for image caption evaluation that surpasses complex existing metrics in correlation with human assessments.

Findings

01

WEmbSim outperforms SPICE, CIDEr, and WMD at system-level correlation.

02

It achieves the best accuracy in matching human consensus scores.

03

WEmbSim sets a new baseline for unsupervised caption evaluation metrics.

Abstract

The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings(MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.