Average Reward Adjusted Discounted Reinforcement Learning:   Near-Blackwell-Optimal Policies for Real-World Applications

Manuel Schneckenreither

arXiv:2004.00857·cs.LG·April 3, 2020·6 cites

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Manuel Schneckenreither

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning algorithm that assesses average rewards separately, providing near-Blackwell-optimal policies suitable for real-world operations research problems, especially where non-zero rewards are involved.

Contribution

It offers deep theoretical insights into standard discounted RL and develops a new near-Blackwell-optimal algorithm that improves policy inference in complex operational settings.

Findings

01

The new algorithm infers optimal policies on all tested problems.

02

It addresses limitations of standard discounted RL with non-zero rewards.

03

Proven effectiveness on M/M/1 queuing systems.

Abstract

Although in recent years reinforcement learning has become very popular the number of successful applications to different kinds of operations research problems is rather scarce. Reinforcement learning is based on the well-studied dynamic programming technique and thus also aims at finding the best stationary policy for a given Markov Decision Process, but in contrast does not require any model knowledge. The policy is assessed solely on consecutive states (or state-action pairs), which are observed while an agent explores the solution space. The contributions of this paper are manifold. First we provide deep theoretical insights to the widely applied standard discounted reinforcement learning framework, which give rise to the understanding of why these algorithms are inappropriate when permanently provided with non-zero rewards, such as costs or profit. Second, we establish a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Scheduling and Optimization Algorithms