On the use of human reference data for evaluating automatic image   descriptions

Emiel van Miltenburg

arXiv:2006.08792·cs.CL·June 17, 2020

On the use of human reference data for evaluating automatic image descriptions

Emiel van Miltenburg

PDF

Open Access

TL;DR

This paper discusses the limitations of current human-generated image description datasets and emphasizes the need for improved, detailed reference data to better evaluate and develop automatic image description systems, especially for visually impaired users.

Contribution

It highlights the insufficiency of existing datasets and advocates for more detailed guidelines and alternative evaluation methods for image descriptions.

Findings

01

Current datasets are of insufficient quality.

02

Improved guidelines are needed for description generation.

03

Evaluation should consider alternative methods beyond reference similarity.

Abstract

Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER, etc). Thus, both the quality of the systems as well as the quality of the evaluation depends on the quality of the descriptions. As Section 2 will show, the quality of current image description datasets is insufficient. I argue that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions. With high-quality data, evaluation of image description systems could use reference descriptions, but we should also look for alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques