Measuring Compositional Generalization: A Comprehensive Method on   Realistic Data

Daniel Keysers; Nathanael Sch\"arli; Nathan Scales; Hylke Buisman,; Daniel Furrer; Sergii Kashubin; Nikola Momchev; Danila Sinopalnikov; Lukasz; Stafiniak; Tibor Tihon; Dmitry Tsarkov; Xiao Wang; Marc van Zee; Olivier; Bousquet

arXiv:1912.09713·cs.LG·June 26, 2020·55 cites

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

Daniel Keysers, Nathanael Sch\"arli, Nathan Scales, Hylke Buisman,, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz, Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier, Bousquet

PDF

Open Access 3 Repos 2 Datasets

TL;DR

This paper introduces a systematic method for constructing realistic benchmarks to evaluate compositional generalization in machine learning, revealing current models' limitations and the negative impact of compound divergence on accuracy.

Contribution

The authors propose a novel approach to create compositionality benchmarks that maximize compound divergence with minimal atom divergence, and demonstrate its effectiveness with new datasets.

Findings

01

Machine learning models struggle with compositional generalization.

02

Higher compound divergence correlates with lower accuracy.

03

The new benchmark datasets reveal significant generalization gaps.

Abstract

State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTest