Provably Efficient Model-Free Constrained RL with Linear Function Approximation
Arnob Ghosh, Xingyu Zhou, Ness Shroff

TL;DR
This paper introduces a novel model-free, simulator-free reinforcement learning algorithm for constrained Markov decision processes with linear function approximation, achieving sublinear regret and constraint violation bounds in large-scale systems.
Contribution
It develops the first model-free, simulator-free algorithm with provable sublinear regret and constraint violation bounds for constrained RL with linear function approximation.
Findings
Achieves $ ilde{O}( oot{3} ext{d}^3 ext{H}^3 T)$ regret and constraint violation bounds.
Introduces primal-dual optimization into LSVI-UCB for balancing regret and constraints.
Employs a soft-max policy to enable uniform concentration and zero constraint violation.
Abstract
We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that regret and constraint violation bounds can be achieved, where is the dimension of the feature mapping, …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research
