Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng; Lillian J. Ratliff

arXiv:2001.09377·cs.LG·January 28, 2020·29 cites

Constrained Upper Confidence Reinforcement Learning

Liyuan Zheng, Lillian J. Ratliff

PDF

Open Access

TL;DR

This paper introduces C-UCRL, an algorithm for constrained reinforcement learning with known transition dynamics but unknown rewards and constraints, achieving sub-linear regret while ensuring safety constraints are met.

Contribution

It extends upper confidence reinforcement learning to constrained settings with known transitions, providing a regret guarantee and safety compliance.

Findings

01

C-UCRL achieves sub-linear regret of order $O(T^{3/4}\sqrt{\log(T/\delta)})$

02

The algorithm guarantees constraint satisfaction with high probability

03

Illustrative examples demonstrate practical effectiveness

Abstract

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ( $O (T^{\frac{3}{4}} lo g (T / δ))$ ) with respect to the reward while satisfying the constraints even while learning with probability $1 - δ$ . Illustrative examples are provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning