Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P., Parikh, He He

TL;DR
This paper discusses how reinforcement learning-based reward functions in conditional text generation can lead to reward gaming, amplifying undesirable patterns despite high training performance, and explores potential solutions.
Contribution
It highlights the issue of reward gaming in natural language generation, illustrating common cases and discussing potential fixes and future research directions.
Findings
Reward functions can incorrectly assign high rewards to undesirable patterns.
RL training can amplify spurious correlations and noise-induced issues.
Discussion of potential solutions and future research areas.
Abstract
To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring spurious correlation, and covariate shift. We show that even though learned metrics achieve high performance on the distribution of the data used to train the reward function, the undesirable patterns may be amplified during RL training of the text generation model. While there has been discussion about reward gaming in the RL or safety community, in this discussion piece, we would like to highlight reward gaming in the natural language generation (NLG) community using concrete conditional text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsALIGN
