Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Daniel Keysers, Nathanael Sch\"arli, Nathan Scales, Hylke Buisman,, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz, Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier, Bousquet

TL;DR
This paper introduces a systematic method for constructing realistic benchmarks to evaluate compositional generalization in machine learning, revealing current models' limitations and the negative impact of compound divergence on accuracy.
Contribution
The authors propose a novel approach to create compositionality benchmarks that maximize compound divergence with minimal atom divergence, and demonstrate its effectiveness with new datasets.
Findings
Machine learning models struggle with compositional generalization.
Higher compound divergence correlates with lower accuracy.
The new benchmark datasets reveal significant generalization gaps.
Abstract
State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
