Provably Efficient RL for Linear MDPs under Instantaneous Safety   Constraints in Non-Convex Feature Spaces

Amirhossein Roknilamouki; Arnob Ghosh; Ming Shi; Fatemeh Nourzad,; Eylem Ekici; Ness B. Shroff

arXiv:2502.18655·cs.LG·February 27, 2025

Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces

Amirhossein Roknilamouki, Arnob Ghosh, Ming Shi, Fatemeh Nourzad,, Eylem Ekici, Ness B. Shroff

PDF

Open Access 1 Video

TL;DR

This paper introduces provably efficient reinforcement learning algorithms for linear Markov decision processes with instantaneous safety constraints in non-convex feature spaces, achieving low regret and zero safety violation probability.

Contribution

It develops novel techniques for bounding the value function class's covering number and proposes a two-phase algorithm for non-star-convex cases, advancing safe RL in complex, non-convex environments.

Findings

01

Achieves regret bound of (igl(1 + 1/ auigr) \u221a{ ext{log}(1/ au) d^3 H^4 K})

02

Guarantees zero safety constraint violation with high probability

03

Demonstrates effectiveness through autonomous driving simulations

Abstract

In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form. In this paper, we establish a regret bound of $\tilde{O} ((1 + \frac{1}{τ}) lo g (\frac{1}{τ}) d^{3} H^{4} K)$ , applicable to both star-convex and non-star-convex cases, where $d$ is the feature dimension, $H$ the episode length, $K$ the number of episodes, and $τ$ the safety threshold. Moreover, the violation of safety constraints is zero with high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning