Analyzing and Bridging the Gap between Maximizing Total Reward and   Discounted Reward in Deep Reinforcement Learning

Shuyu Yin; Fei Wen; Peilin Liu; Tao Luo

arXiv:2407.13279·cs.LG·March 19, 2025

Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

PDF

Open Access

TL;DR

This paper analyzes the performance gap between maximizing total reward and discounted reward in deep reinforcement learning, proposing methods to align these objectives for improved policy optimization.

Contribution

It provides a theoretical analysis of the reward gap and introduces two novel approaches to align total and discounted rewards in deep RL.

Findings

01

Increasing the discount factor may not eliminate the reward gap in cyclic environments.

02

Modifying terminal state values can help align total and discounted rewards.

03

Calibrating reward data improves robustness and performance in off-policy deep RL.

Abstract

The optimal objective is a fundamental aspect of reinforcement learning (RL), as it determines how policies are evaluated and optimized. While total return maximization is the ideal objective in RL, discounted return maximization is the practical objective due to its stability. This can lead to a misalignment of objectives. To better understand the problem, we theoretically analyze the performance gap between the policy maximizes the total return and the policy maximizes the discounted return. Our analysis reveals that increasing the discount factor can be ineffective at eliminating this gap when environment contains cyclic states,a frequent scenario. To address this issue, we propose two alternative approaches to align the objectives. The first approach achieves alignment by modifying the terminal state value, treating it as a tunable hyper-parameter with its suitable range defined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces

MethodsALIGN