ContextRef: Evaluating Referenceless Metrics For Image Description   Generation

Elisa Kreiss; Eric Zelikman; Christopher Potts; Nick Haber

arXiv:2309.11710·cs.CL·September 22, 2023

ContextRef: Evaluating Referenceless Metrics For Image Description Generation

Elisa Kreiss, Eric Zelikman, Christopher Potts, Nick Haber

PDF

Open Access 1 Repo

TL;DR

This paper introduces ContextRef, a benchmark for evaluating referenceless image description metrics, revealing current models' limitations and the importance of context in aligning with human judgments.

Contribution

The paper presents ContextRef, a new benchmark with human ratings and robustness checks to evaluate and improve referenceless image description metrics.

Findings

01

Current models fail to perform well on ContextRef

02

Fine-tuning improves model performance significantly

03

Context dependence remains a major challenge

Abstract

Referenceless metrics (e.g., CLIPScore) use pretrained vision--language models to assess image descriptions directly without costly ground-truth reference texts. Such methods can facilitate rapid progress, but only if they truly align with human preference judgments. In this paper, we introduce ContextRef, a benchmark for assessing referenceless metrics for such alignment. ContextRef has two components: human ratings along a variety of established quality dimensions, and ten diverse robustness checks designed to uncover fundamental weaknesses. A crucial aspect of ContextRef is that images and descriptions are presented in context, reflecting prior work showing that context is important for description quality. Using ContextRef, we assess a variety of pretrained models, scoring functions, and techniques for incorporating context. None of the methods is successful with ContextRef, but we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elisakreiss/contextref
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsNone · ALIGN