CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language   Learning

Alessandro Suglia; Ioannis Konstas; Andrea Vanzo; Emanuele; Bastianelli; Desmond Elliott; Stella Frank; Oliver Lemon

arXiv:2006.02174·cs.CL·June 4, 2020·1 cites

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele, Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon

PDF

Open Access 1 Datasets

TL;DR

This paper introduces GROLLA, a comprehensive evaluation framework with three sub-tasks for assessing grounded language learning models, and presents the CompGuessWhat?! dataset to evaluate attribute grounding and generalization in neural representations.

Contribution

It proposes a multi-task evaluation framework and a new dataset to better assess the quality and generalization of grounded language learning models.

Findings

01

Current models have limited ability to encode object attributes (average F1 44.27).

02

Models struggle with zero-shot generalization, achieving only 50.06% accuracy.

03

The framework highlights the need for more expressive and robust representations.

Abstract

Approaches to Grounded Language Learning typically focus on a single task-based final performance measure that may not depend on desirable properties of the learned hidden representations, such as their ability to predict salient attributes or to generalise to unseen situations. To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular concerning attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?!…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

asuglia/compguesswhat
dataset· 92 dl
92 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques