Systematic Generalization on gSCAN: What is Nearly Solved and What is   Next?

Linlu Qiu; Hexiang Hu; Bowen Zhang; Peter Shaw; Fei Sha

arXiv:2109.12243·cs.CL·September 28, 2021·1 cites

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha

PDF

Open Access 2 Repos

TL;DR

This paper evaluates the gSCAN benchmark for grounded language understanding, showing that Transformer models perform well on many tasks, but still face fundamental challenges in systematic generalization and data efficiency, leading to new task proposals.

Contribution

It demonstrates the strong performance of Transformer models on gSCAN and introduces new challenging tasks to address remaining systematic generalization issues.

Findings

01

Transformer models outperform specialized methods on many gSCAN splits

02

Remaining errors highlight fundamental systematic generalization challenges

03

Current models are data inefficient for the narrow command scope

Abstract

We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding. First, we study which aspects of the original benchmark can be solved by commonly used methods in multi-modal research. We find that a general-purpose Transformer-based model with cross-modal attention achieves strong performance on a majority of the gSCAN splits, surprisingly outperforming more specialized approaches from prior work. Furthermore, our analysis suggests that many of the remaining errors reveal the same fundamental challenge in systematic generalization of linguistic constructs regardless of visual context. Second, inspired by this finding, we propose challenging new tasks for gSCAN by generating data to incorporate relations between objects in the visual environment. Finally, we find that current models are surprisingly data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling