UMIC: An Unreferenced Metric for Image Captioning via Contrastive   Learning

Hwanhee Lee; Seunghyun Yoon; Franck Dernoncourt; Trung Bui; Kyomin; Jung

arXiv:2106.14019·cs.CL·June 29, 2021

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin, Jung

PDF

1 Repo

TL;DR

UMIC is a novel unreferenced image captioning metric leveraging contrastive learning with Vision-and-Language BERT, outperforming existing metrics in correlation without needing reference captions.

Contribution

Introduces UMIC, a reference-free image captioning evaluation metric trained with contrastive learning, and provides a new human-annotated benchmark dataset.

Findings

01

UMIC outperforms previous metrics in correlation with human judgments.

02

UMIC does not require reference captions for evaluation.

03

A new human-annotated dataset for image captioning is released.

Abstract

Despite the success of various text generation metrics such as BERTScore, it is still difficult to evaluate the image captions without enough reference captions due to the diversity of the descriptions. In this paper, we introduce a new metric UMIC, an Unreferenced Metric for Image Captioning which does not require reference captions to evaluate image captions. Based on Vision-and-Language BERT, we train UMIC to discriminate negative captions via contrastive learning. Also, we observe critical problems of the previous benchmark dataset (i.e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions. We validate UMIC on four datasets, including our new dataset, and show that UMIC has a higher correlation than all previous metrics that require multiple references. We release the benchmark dataset and pre-trained models to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hwanheelee1993/UMIC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · WordPiece · Adam · Dropout · Layer Normalization · Linear Warmup With Linear Decay