Encoder-Agnostic Adaptation for Conditional Language Generation
Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann and, Alexander M. Rush

TL;DR
This paper introduces a novel encoder-agnostic method called pseudo self attention for adapting pretrained language models to conditional text generation tasks, demonstrating improved performance and coherence.
Contribution
It proposes a new technique for conditioning pretrained models directly in self attention, addressing previous limitations in encoder-agnostic generation adaptation.
Findings
Outperforms strong baselines on four tasks
Produces coherent and high-quality generations
Data-efficient adaptation method
Abstract
Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks. However it is an open-question how to use similar techniques for language generation. Early results in the encoder-agnostic setting have been mostly negative. In this work we explore methods for adapting a pretrained language model to arbitrary conditional input. We observe that pretrained transformer models are sensitive to large parameter changes during tuning. We therefore propose an adaptation that directly injects arbitrary conditioning into self attention, an approach we call pseudo self attention. Through experiments on four diverse conditional text generation tasks we show that this encoder-agnostic technique outperforms strong baselines, produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
