Text Generation by Learning from Demonstrations

Richard Yuanzhe Pang; He He

arXiv:2009.07839·cs.CL·March 4, 2021·5 cites

Text Generation by Learning from Demonstrations

Richard Yuanzhe Pang, He He

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GOLD, a reinforcement learning-based method for text generation that learns from demonstrations to improve quality, reduce exposure bias, and be less sensitive to decoding strategies.

Contribution

The paper presents GOLD, a novel off-policy reinforcement learning algorithm that leverages importance weighting from demonstrations to enhance text generation quality.

Findings

01

GOLD outperforms MLE and policy gradient in summarization, question generation, and translation.

02

Models trained with GOLD are less sensitive to decoding algorithms.

03

GOLD alleviates exposure bias in text generation.

Abstract

Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation. This paradigm leads to (i) diverse but low-quality samples due to mismatched learning objective and evaluation metric (likelihood vs. quality) and (ii) exposure bias due to mismatched history distributions (gold vs. model-generated). To alleviate these problems, we frame text generation as an offline reinforcement learning (RL) problem with expert demonstrations (i.e., the reference), where the goal is to maximize quality given model-generated histories. We propose GOLD (generation by off-policy learning from demonstrations): an easy-to-optimize algorithm that learns from the demonstrations by importance weighting. Intuitively, GOLD upweights confident tokens and downweights unconfident ones in the reference during training, avoiding optimization issues faced by prior RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yzpang/gold-off-policy-text-gen-iclr21
pytorchOfficial

Videos

Text Generation by Learning from Demonstrations· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications