Reward Models in Deep Reinforcement Learning: A Survey

Rui Yu; Shenghua Wan; Yucen Wang; Chen-Xiao Gao; Le Gan; Zongzhang Zhang; De-Chuan Zhan

arXiv:2506.15421·cs.LG·June 19, 2025

Reward Models in Deep Reinforcement Learning: A Survey

Rui Yu, Shenghua Wan, Yucen Wang, Chen-Xiao Gao, Le Gan, Zongzhang Zhang, De-Chuan Zhan

PDF

Open Access

TL;DR

This survey comprehensively reviews recent reward modeling techniques in deep reinforcement learning, highlighting their applications, evaluation methods, and future research directions to improve alignment with true objectives.

Contribution

It provides the first systematic overview of reward modeling approaches in deep RL, categorizing methods and discussing their applications and evaluation strategies.

Findings

01

Categorization of reward modeling techniques based on source, mechanism, and paradigm.

02

Discussion of applications and evaluation methods for reward models.

03

Identification of promising future research directions.

Abstract

In reinforcement learning (RL), agents continually interact with the environment and use the feedback to refine their behavior. To guide policy optimization, reward models are introduced as proxies of the desired objectives, such that when the agent maximizes the accumulated reward, it also fulfills the task designer's intentions. Recently, significant attention from both academic and industrial researchers has focused on developing reward models that not only align closely with the true objectives but also facilitate policy optimization. In this survey, we provide a comprehensive review of reward modeling techniques within the deep RL literature. We begin by outlining the background and preliminaries in reward modeling. Next, we present an overview of recent reward modeling approaches, categorizing them based on the source, the mechanism, and the learning paradigm. Building on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics