Reinforcement Learning of Markov Decision Processes with Peak   Constraints

Ather Gattami

arXiv:1901.07839·math.OC·December 9, 2019·5 cites

Reinforcement Learning of Markov Decision Processes with Peak Constraints

Ather Gattami

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning approach for Markov Decision Processes with peak constraints, using a game theoretic maximin Q-learning method that guarantees convergence to optimal policies without prior knowledge of the environment.

Contribution

It presents the first convergence-guaranteed reinforcement learning algorithms for MDPs with peak constraints, applicable to both discounted and average reward settings.

Findings

01

Maximin Q-learning converges to optimal policies.

02

First algorithms with convergence guarantees for peak-constrained MDPs.

03

Applicable to both discounted and average reward scenarios.

Abstract

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge of the constraint-functions. We introduce a game theoretic approach to construct reinforcement learning algorithms where the agent maximizes an unconstrained objective that depends on the simulated action of the minimizing opponent which acts on a finite set of actions and the output data of the constraint functions (rewards). We show that the policies obtained from maximin Q-learning converge to the optimal policies. To the best of our knowledge, this is the first time learning algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems

MethodsQ-Learning