Taylor Expansion of Discount Factors
Yunhao Tang, Mark Rowland, R\'emi Munos, Michal Valko

TL;DR
This paper investigates the impact of using different discount factors during learning versus evaluation in reinforcement learning, proposing a family of interpolated objectives that improve value estimation and policy optimization.
Contribution
It introduces a novel framework for interpolating value functions between two discount factors, providing new methods for value estimation and policy updates with empirical benefits.
Findings
Empirical performance gains in RL tasks using the proposed interpolation framework.
Insights into the effects of discount factor discrepancies on learning.
New methods for value function estimation and policy optimization.
Abstract
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors. Our analysis suggests new ways for estimating value functions and performing policy optimization updates, which demonstrate empirical performance gains. This framework also leads to new insights on commonly-used deep RL heuristic modifications to policy optimization algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEnergy Efficiency and Management · Reinforcement Learning in Robotics
