Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong, Pasupat, Yuan Zhang

TL;DR
This paper demonstrates that using carefully designed intermediate representations significantly enhances the compositional generalization of pre-trained seq2seq models in semantic parsing tasks, achieving state-of-the-art results.
Contribution
It introduces the use of intermediate representations in pre-trained seq2seq models without altering architecture, leading to substantial improvements in compositional generalization.
Findings
Achieved +14.8 accuracy on CFQ dataset
Improved accuracy by +15.0 to +19.4 on text-to-SQL datasets
Intermediate representations are a key factor for better generalization
Abstract
Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Genomics and Phylogenetic Studies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
