EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained   Embedding Matching

Yaya Shi; Xu Yang; Haiyang Xu; Chunfeng Yuan; Bing Li; Weiming Hu,; Zheng-Jun Zha

arXiv:2111.08919·cs.CV·July 19, 2022

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu,, Zheng-Jun Zha

PDF

1 Repo

TL;DR

EMScore is a new reference-free metric for video captioning that measures similarity between videos and captions using embedding matching, outperforming existing metrics in correlation with human judgment and ability to detect hallucinations.

Contribution

Proposes EMScore, a novel embedding matching-based, reference-free metric for video captioning that combines coarse- and fine-grained visual and linguistic similarity measures.

Findings

01

EMScore shows higher correlation with human judgments.

02

EMScore effectively detects hallucinating captions.

03

The datasets VATEX-EVAL and ActivityNet-FOIL are introduced for evaluation.

Abstract

Current metrics for video captioning are mostly based on the text-level comparison between reference and candidate captions. However, they have some insuperable drawbacks, e.g., they cannot handle videos without references, and they may result in biased evaluation due to the one-to-many nature of video-to-text and the neglect of visual relevance. From the human evaluator's viewpoint, a high-quality caption should be consistent with the provided video, but not necessarily be similar to the reference in literal or semantics. Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. Benefit from the recent development of large-scale pre-training models, we exploit a well pre-trained vision-language model to extract visual and linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiyaya/emscore
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.