Text Generation by Learning from Demonstrations
Richard Yuanzhe Pang, He He

TL;DR
This paper introduces GOLD, a reinforcement learning-based method for text generation that learns from demonstrations to improve quality, reduce exposure bias, and be less sensitive to decoding strategies.
Contribution
The paper presents GOLD, a novel off-policy reinforcement learning algorithm that leverages importance weighting from demonstrations to enhance text generation quality.
Findings
GOLD outperforms MLE and policy gradient in summarization, question generation, and translation.
Models trained with GOLD are less sensitive to decoding algorithms.
GOLD alleviates exposure bias in text generation.
Abstract
Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation. This paradigm leads to (i) diverse but low-quality samples due to mismatched learning objective and evaluation metric (likelihood vs. quality) and (ii) exposure bias due to mismatched history distributions (gold vs. model-generated). To alleviate these problems, we frame text generation as an offline reinforcement learning (RL) problem with expert demonstrations (i.e., the reference), where the goal is to maximize quality given model-generated histories. We propose GOLD (generation by off-policy learning from demonstrations): an easy-to-optimize algorithm that learns from the demonstrations by importance weighting. Intuitively, GOLD upweights confident tokens and downweights unconfident ones in the reference during training, avoiding optimization issues faced by prior RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
