When can transformers compositionally generalize in-context?
Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt,, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, Jo\~ao Sacramento

TL;DR
This paper investigates the conditions under which transformers can generalize compositionally in in-context learning, revealing that explicit separation between inference and execution is crucial for such generalization.
Contribution
The study introduces a modular multitask setting to analyze compositional generalization and demonstrates that a bottleneck enforcing separation enables transformers to generalize compositionally.
Findings
Transformers struggle with compositional generalization without explicit separation.
A bottleneck separating inference and execution improves compositional generalization.
Transformers are inherently limited in in-context compositional generalization without architectural constraints.
Abstract
Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
