Improving Compositional Generalization with Latent Structure and Data   Augmentation

Linlu Qiu; Peter Shaw; Panupong Pasupat; Pawe{\l} Krzysztof Nowak; Tal; Linzen; Fei Sha; Kristina Toutanova

arXiv:2112.07610·cs.CL·May 6, 2022

Improving Compositional Generalization with Latent Structure and Data Augmentation

Linlu Qiu, Peter Shaw, Panupong Pasupat, Pawe{\l} Krzysztof Nowak, Tal, Linzen, Fei Sha, Kristina Toutanova

PDF

Open Access 2 Repos

TL;DR

This paper introduces CSL, a generative model with a grammar backbone, which enhances compositional generalization in neural models through data augmentation, achieving state-of-the-art results in semantic parsing tasks.

Contribution

The paper presents CSL, a novel generative model with a grammar backbone, for effective data augmentation to improve compositional generalization in neural networks.

Findings

01

CSL effectively transfers compositional bias to T5 models.

02

Augmentation with CSL improves performance on semantic parsing tasks.

03

Achieves state-of-the-art results on real-world compositional generalization benchmarks.

Abstract

Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL). CSL is a generative model with a quasi-synchronous context-free grammar backbone, which we induce from the training data. We sample recombined examples from CSL and add them to the fine-tuning data of a pre-trained sequence-to-sequence model (T5). This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks, and results in a model even stronger than a T5-CSL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Layer Normalization · Inverse Square Root Schedule · Attention Dropout · Adafactor