When can transformers compositionally generalize in-context?

Seijin Kobayashi; Simon Schug; Yassir Akram; Florian Redhardt,; Johannes von Oswald; Razvan Pascanu; Guillaume Lajoie; Jo\~ao Sacramento

arXiv:2407.12275·cs.LG·July 18, 2024

When can transformers compositionally generalize in-context?

Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt,, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, Jo\~ao Sacramento

PDF

Open Access

TL;DR

This paper investigates the conditions under which transformers can generalize compositionally in in-context learning, revealing that explicit separation between inference and execution is crucial for such generalization.

Contribution

The study introduces a modular multitask setting to analyze compositional generalization and demonstrates that a bottleneck enforcing separation enables transformers to generalize compositionally.

Findings

01

Transformers struggle with compositional generalization without explicit separation.

02

A bottleneck separating inference and execution improves compositional generalization.

03

Transformers are inherently limited in in-context compositional generalization without architectural constraints.

Abstract

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of which might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications