Discounted Reinforcement Learning Is Not an Optimization Problem

Abhishek Naik; Roshan Shariff; Niko Yasui; Hengshuai Yao; Richard S.; Sutton

arXiv:1910.02140·cs.AI·November 28, 2019·27 cites

Discounted Reinforcement Learning Is Not an Optimization Problem

Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S., Sutton

PDF

Open Access

TL;DR

This paper argues that discounted reinforcement learning cannot be viewed as an optimization problem with function approximation in continuing tasks, advocating for alternative approaches like average reward maximization.

Contribution

It clarifies the fundamental incompatibility of discounting with control in continuing tasks and promotes rigorous optimization methods such as average reward maximization.

Findings

01

Discounted RL is not an optimization problem with function approximation.

02

Optimal policies do not exist under discounting in continuing tasks.

03

Recommends using average reward maximization for control in ongoing tasks.

Abstract

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It is not an optimization problem in its usual formulation, so when using function approximation there is no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and its connection to the average reward formulation. We encourage researchers to adopt rigorous optimization approaches, such as maximizing average reward, for reinforcement learning in continuing tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Adaptive Dynamic Programming Control