A Benchmark for Systematic Generalization in Grounded Language   Understanding

Laura Ruis; Jacob Andreas; Marco Baroni; Diane Bouchacourt; Brenden M.; Lake

arXiv:2003.05161·cs.CL·October 20, 2020·45 cites

A Benchmark for Systematic Generalization in Grounded Language Understanding

Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M., Lake

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces gSCAN, a new benchmark for evaluating how well models can generalize compositionally in grounded language understanding, highlighting current models' limitations in systematic generalization.

Contribution

The paper presents gSCAN, a novel benchmark grounded in a grid world, to evaluate compositional generalization in situated language understanding, extending prior syntactic-focused benchmarks.

Findings

01

Models struggle with systematic compositional generalization

02

Baseline models fail dramatically on novel compositional tasks

03

gSCAN enables evaluation of linguistically motivated rule learning

Abstract

Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts ("greet the pink brontosaurus by the ferris wheel"). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world, facilitating novel evaluations of acquiring linguistically motivated rules. For example, agents must understand how adjectives such as 'small' are interpreted relative to the current world state or how adverbs such as 'cautiously' combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

A Benchmark for Systematic Generalization in Grounded Language Understanding· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications