Utility-inspired Reward Transformations Improve Reinforcement Learning   Training of Language Models

Roberto-Rafael Maura-Rivero; Chirag Nagpal; Roma Patel; Francesco; Visin

arXiv:2501.06248·cs.LG·February 26, 2025

Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models

Roberto-Rafael Maura-Rivero, Chirag Nagpal, Roma Patel, Francesco, Visin

PDF

TL;DR

This paper introduces a utility-inspired reward transformation for reinforcement learning in language models, improving training by better handling reward sensitivities and dependencies, leading to more helpful and less harmful outputs.

Contribution

It proposes a novel Inada-inspired reward transformation that addresses limitations of linear reward aggregation in RL training of language models.

Findings

01

Inada-inspired rewards improve helpfulness of generated text

02

Models trained with this method are less harmful

03

Outperforms traditional reward averaging in experiments

Abstract

Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training. This overlooks crucial aspects of individual reward dimensions and inter-reward dependencies that can lead to sub-optimal outcomes in generations. In this work, we show how linear aggregation of rewards exhibits some vulnerabilities that can lead to undesired properties of generated text. We then propose a transformation of reward functions inspired by economic theory of utility functions (specifically Inada conditions), that enhances sensitivity to low reward values while diminishing sensitivity to already high values. We compare our approach to the existing baseline methods that linearly aggregate rewards and show how the Inada-inspired reward feedback is superior to traditional weighted averaging. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.