Scaling can lead to compositional generalization
Florian Redhardt, Yassir Akram, Simon Schug

TL;DR
This paper demonstrates that scaling data and model size enables neural networks to achieve compositional generalization across tasks, with theoretical proofs and practical decoding methods linking internal representations to task success.
Contribution
It shows that standard neural networks can generalize compositionally when sufficiently scaled and trained on comprehensive data, supported by theoretical and empirical evidence.
Findings
Scaling data and model size improves compositional generalization.
Standard multilayer perceptrons can approximate compositional tasks with linear neurons.
Linearly decodable task constituents correlate with successful generalization.
Abstract
Can neural networks systematically capture discrete, compositional task structure despite their continuous, distributed nature? The impressive capabilities of large-scale neural networks suggest that the answer to this question is yes. However, even for the most capable models, there are still frequent failure cases that raise doubts about their compositionality. Here, we seek to understand what it takes for a standard neural network to generalize over tasks that share compositional structure. We find that simply scaling data and model size leads to compositional generalization. We show that this holds across different task encodings as long as the training distribution sufficiently covers the task space. In line with this finding, we prove that standard multilayer perceptrons can approximate a general class of compositional task families to arbitrary precision using only a linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Machine Learning in Materials Science
