Compositional generalization in semantic parsing with pretrained transformers
A. Emin Orhan

TL;DR
This paper investigates the limits of pretraining benefits in semantic parsing, revealing that pretrained models transfer knowledge broadly across domains but face constraints, and larger models benefit more from pretraining.
Contribution
It demonstrates the transferability of pretrained models across different domains and highlights the importance of pretraining scale and domain similarity for semantic parsing.
Findings
Pretraining on non-English and programming languages improves English semantic parsing.
Pretraining on protein sequences generally worsens performance on benchmarks.
Larger models benefit more from pretraining and are harder to train from scratch.
Abstract
Large-scale pretraining instills large amounts of knowledge in deep neural networks. This, in turn, improves the generalization behavior of these models in downstream tasks. What exactly are the limits to the generalization benefits of large-scale pretraining? Here, we report observations from some simple experiments aimed at addressing this question in the context of two semantic parsing tasks involving natural language, SCAN and COGS. We show that language models pretrained exclusively with non-English corpora, or even with programming language corpora, significantly improve out-of-distribution generalization in these benchmarks, compared with models trained from scratch, even though both benchmarks are English-based. This demonstrates the surprisingly broad transferability of pretrained representations and knowledge. Pretraining with a large-scale protein sequence prediction task, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
