Quality Estimation for Image Captions Based on Large-scale Human   Evaluations

Tomer Levinboim; Ashish V. Thapliyal; Piyush Sharma; Radu Soricut

arXiv:1909.03396·cs.CL·June 3, 2021

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut

PDF

1 Repo

TL;DR

This paper introduces a large-scale human evaluation dataset for estimating the quality of image captions without ground-truth references, enabling better filtering of low-quality captions in real-world applications.

Contribution

It presents a new human evaluation process, a large dataset of over 600k ratings, and baseline models for caption quality estimation that improve caption filtering.

Findings

01

QE models trained on coarse ratings effectively detect low-quality captions

02

Large-scale dataset enables robust training of caption quality estimators

03

Filtering low-quality captions improves user experience

Abstract

Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild. In this paper, we focus on the task of Quality Estimation (QE) for image captions, which attempts to model the caption quality from a human perspective and without access to ground-truth references, so that it can be applied at prediction time to detect low-quality captions produced on previously unseen images. For this task, we develop a human evaluation process that collects coarse-grained caption annotations from crowdsourced users, which is then used to collect a large scale dataset spanning more than 600k caption quality ratings. We then carefully validate the quality of the collected ratings and establish baseline models for this new QE task. Finally, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research-datasets/Image-Caption-Quality-Dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.