Towards Understanding Sample Variance in Visually Grounded Language   Generation: Evaluations and Observations

Wanrong Zhu; Xin Eric Wang; Pradyumna Narayana; Kazoo Sone; Sugato; Basu; William Yang Wang

arXiv:2010.03644·cs.CL·October 9, 2020·1 cites

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato, Basu, William Yang Wang

PDF

Open Access

TL;DR

This paper investigates how sample variance in multi-reference datasets impacts the evaluation of visually grounded language generation models, emphasizing the importance of reporting variance and analyzing dataset reliability.

Contribution

It introduces experimental analyses of sample variance effects in vision-and-language tasks, highlighting the need for variance reporting and dataset reliability considerations.

Findings

01

CIDEr metric shows larger variance than others

02

Human references vary significantly across datasets and tasks

03

Reporting variance is crucial for reliable evaluation

Abstract

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and benchmarks are reliable. In this work, we set forth to design a set of experiments to understand an important but often ignored problem in visually grounded language generation: given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance? Empirically, we study several multi-reference datasets and corresponding vision-and-language tasks. We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task; that metric-wise, CIDEr has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition