Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Brenden M. Lake, Marco Baroni

TL;DR
This paper investigates the ability of sequence-to-sequence RNNs to generalize compositional language skills, revealing they succeed with simple variations but fail at systematic compositionality, highlighting limitations in neural network generalization.
Contribution
The study introduces the SCAN domain for testing compositional generalization and demonstrates RNNs' limitations in systematic compositionality, contrasting with human language abilities.
Findings
RNNs succeed with small command variations
RNNs fail at systematic compositional generalization
Lack of systematicity may explain neural networks' data hunger
Abstract
Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
