When Can Transformers Ground and Compose: Insights from Compositional   Generalization Benchmarks

Ankur Sikarwar; Arkil Patel; Navin Goyal

arXiv:2210.12786·cs.CL·November 1, 2022

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

Ankur Sikarwar, Arkil Patel, Navin Goyal

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that transformers can effectively perform grounded compositional reasoning in navigation tasks, outperforming specialized models, and provides insights into their generalization capabilities and underlying computations.

Contribution

It introduces a simple transformer-based model that surpasses specialized architectures on grounding benchmarks and offers a mathematical analysis of its reasoning process.

Findings

01

Transformers outperform specialized models on ReaSCAN and gSCAN.

02

A specific split testing depth generalization is unfair, but transformers can generalize with an amended split.

03

A single self-attention layer with one head can generalize to new object attribute combinations.

Abstract

Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this work, we present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version of gSCAN. On analyzing the task, we find that identifying the target location in the grid world is the main challenge for the models. Furthermore, we show that a particular split in ReaSCAN, which tests depth generalization, is unfair. On an amended version of this split, we show that transformers can generalize to deeper input structures. Finally, we design a simpler grounded compositional generalization task, RefEx, to investigate how transformers reason compositionally. We show that a single self-attention layer with a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ankursikarwar/grounded-compositional-generalization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems