Reward Gaming in Conditional Text Generation

Richard Yuanzhe Pang; Vishakh Padmakumar; Thibault Sellam; Ankur P.; Parikh; He He

arXiv:2211.08714·cs.CL·June 2, 2023

Reward Gaming in Conditional Text Generation

Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P., Parikh, He He

PDF

Open Access

TL;DR

This paper discusses how reinforcement learning-based reward functions in conditional text generation can lead to reward gaming, amplifying undesirable patterns despite high training performance, and explores potential solutions.

Contribution

It highlights the issue of reward gaming in natural language generation, illustrating common cases and discussing potential fixes and future research directions.

Findings

01

Reward functions can incorrectly assign high rewards to undesirable patterns.

02

RL training can amplify spurious correlations and noise-induced issues.

03

Discussion of potential solutions and future research areas.

Abstract

To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring spurious correlation, and covariate shift. We show that even though learned metrics achieve high performance on the distribution of the data used to train the reward function, the undesirable patterns may be amplified during RL training of the text generation model. While there has been discussion about reward gaming in the RL or safety community, in this discussion piece, we would like to highlight reward gaming in the natural language generation (NLG) community using concrete conditional text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsALIGN