Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality
Yichen Jiang, Xiang Zhou, Mohit Bansal

TL;DR
This paper introduces mutual exclusivity training and primitive augmentation techniques to improve the compositional generalization of seq2seq models, demonstrating significant empirical gains on standard datasets.
Contribution
It proposes two novel methods addressing memorization and bias issues in seq2seq models, enhancing their systematic generalization capabilities.
Findings
Significant performance improvements on SCAN and COGS datasets.
Mutual exclusivity training reduces incorrect generalizations.
Primitive augmentation diversifies argument structures effectively.
Abstract
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Topic Modeling
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
