Constrained Upper Confidence Reinforcement Learning
Liyuan Zheng, Lillian J. Ratliff

TL;DR
This paper introduces C-UCRL, an algorithm for constrained reinforcement learning with known transition dynamics but unknown rewards and constraints, achieving sub-linear regret while ensuring safety constraints are met.
Contribution
It extends upper confidence reinforcement learning to constrained settings with known transitions, providing a regret guarantee and safety compliance.
Findings
C-UCRL achieves sub-linear regret of order $O(T^{3/4}\sqrt{\log(T/\delta)})$
The algorithm guarantees constraint satisfaction with high probability
Illustrative examples demonstrate practical effectiveness
Abstract
Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret () with respect to the reward while satisfying the constraints even while learning with probability . Illustrative examples are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
