Learning to Generalize Compositionally by Transferring Across Semantic Parsing Tasks
Wang Zhu, Peter Shaw, Tal Linzen, Fei Sha

TL;DR
This paper proposes a method to improve neural network generalization in compositional semantic parsing by strategically training representations for transfer learning across different datasets, leading to better performance on mismatched splits.
Contribution
The paper introduces a novel training strategy that separates representation learning from task-specific layers to enhance compositional transfer learning in semantic parsing.
Findings
Significant improvement in compositional generalization over baselines.
Effective transfer across diverse datasets like COGS, GeoQuery, and SCAN.
Ablation studies validate the utility of the proposed method.
Abstract
Neural network models often generalize poorly to mismatched domains or distributions. In NLP, this issue arises in particular when models are expected to generalize compositionally, that is, to novel combinations of familiar words and constructions. We investigate learning representations that facilitate transfer learning from one compositional task to another: the representation and the task-specific layers of the models are strategically trained differently on a pre-finetuning task such that they generalize well on mismatched splits that require compositionality. We apply this method to semantic parsing, using three very different datasets, COGS, GeoQuery and SCAN, used alternately as the pre-finetuning and target task. Our method significantly improves compositional generalization over baselines on the test set of the target task, which is held out during fine-tuning. Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
