A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
Yuanpeng Li

TL;DR
This paper establishes a fundamental theoretical condition that neural networks must satisfy to achieve compositional generalization, linking architecture, training, and data properties with mathematical rigor.
Contribution
It derives a necessary and sufficient condition for compositional generalization in neural networks, supported by proofs and illustrative examples.
Findings
Condition involves matching computational graph to true structure
Components must encode sufficient information during training
Potential for pre-training assessment of generalization ability
Abstract
Compositional generalization is a crucial property in artificial intelligence, enabling models to handle novel combinations of known components. While most deep learning models lack this capability, certain models succeed in specific tasks, suggesting the existence of governing conditions. This paper derives a necessary and sufficient condition for compositional generalization in neural networks. Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training. The condition is supported by mathematical proofs. This criterion combines aspects of architecture design, regularization, and training data properties. A carefully designed minimal example illustrates an intuitive understanding of the condition. We also discuss the potential of the condition for assessing compositional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
