Discounted Reinforcement Learning Is Not an Optimization Problem
Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S., Sutton

TL;DR
This paper argues that discounted reinforcement learning cannot be viewed as an optimization problem with function approximation in continuing tasks, advocating for alternative approaches like average reward maximization.
Contribution
It clarifies the fundamental incompatibility of discounting with control in continuing tasks and promotes rigorous optimization methods such as average reward maximization.
Findings
Discounted RL is not an optimization problem with function approximation.
Optimal policies do not exist under discounting in continuing tasks.
Recommends using average reward maximization for control in ongoing tasks.
Abstract
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It is not an optimization problem in its usual formulation, so when using function approximation there is no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and its connection to the average reward formulation. We encourage researchers to adopt rigorous optimization approaches, such as maximizing average reward, for reinforcement learning in continuing tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Adaptive Dynamic Programming Control
