TL;DR
This paper investigates methods to improve compositional generalization in semantic parsing models, focusing on attention module enhancements and training strategies to better handle out-of-distribution data.
Contribution
It introduces multiple extensions to the attention mechanism and training procedures that enhance compositional generalization in semantic parsing models.
Findings
Using contextual embeddings like BERT improves generalization.
Aligning decoder attention with token alignments enhances performance.
Downsampling frequent program templates reduces overfitting.
Abstract
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components. We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization. We find that the following factors improve compositional generalization: (a) using contextual representations, such as ELMo and BERT, (b) informing the decoder what input tokens have previously been attended to, (c) training the decoder attention to agree with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Tanh Activation · Sigmoid Activation · WordPiece · Long Short-Term Memory · Bidirectional LSTM · Adam · Softmax · Multi-Head Attention · Layer Normalization
