Reinforcement Learning of Markov Decision Processes with Peak Constraints
Ather Gattami

TL;DR
This paper introduces a novel reinforcement learning approach for Markov Decision Processes with peak constraints, using a game theoretic maximin Q-learning method that guarantees convergence to optimal policies without prior knowledge of the environment.
Contribution
It presents the first convergence-guaranteed reinforcement learning algorithms for MDPs with peak constraints, applicable to both discounted and average reward settings.
Findings
Maximin Q-learning converges to optimal policies.
First algorithms with convergence guarantees for peak-constrained MDPs.
Applicable to both discounted and average reward scenarios.
Abstract
In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge of the constraint-functions. We introduce a game theoretic approach to construct reinforcement learning algorithms where the agent maximizes an unconstrained objective that depends on the simulated action of the minimizing opponent which acts on a finite set of actions and the output data of the constraint functions (rewards). We show that the policies obtained from maximin Q-learning converge to the optimal policies. To the best of our knowledge, this is the first time learning algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
MethodsQ-Learning
