Provably Efficient Model-Free Constrained RL with Linear Function   Approximation

Arnob Ghosh; Xingyu Zhou; Ness Shroff

arXiv:2206.11889·cs.LG·January 10, 2023·6 cites

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

Arnob Ghosh, Xingyu Zhou, Ness Shroff

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel model-free, simulator-free reinforcement learning algorithm for constrained Markov decision processes with linear function approximation, achieving sublinear regret and constraint violation bounds in large-scale systems.

Contribution

It develops the first model-free, simulator-free algorithm with provable sublinear regret and constraint violation bounds for constrained RL with linear function approximation.

Findings

01

Achieves $ ilde{O}( oot{3} ext{d}^3 ext{H}^3 T)$ regret and constraint violation bounds.

02

Introduces primal-dual optimization into LSVI-UCB for balancing regret and constraints.

03

Employs a soft-max policy to enable uniform concentration and zero constraint violation.

Abstract

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{O} (d^{3} H^{3} T)$ regret and $\tilde{O} (d^{3} H^{3} T)$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient Model-Free Constrained RL with Linear Function Approximation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research