Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces
Amirhossein Roknilamouki, Arnob Ghosh, Ming Shi, Fatemeh Nourzad,, Eylem Ekici, Ness B. Shroff

TL;DR
This paper introduces provably efficient reinforcement learning algorithms for linear Markov decision processes with instantaneous safety constraints in non-convex feature spaces, achieving low regret and zero safety violation probability.
Contribution
It develops novel techniques for bounding the value function class's covering number and proposes a two-phase algorithm for non-star-convex cases, advancing safe RL in complex, non-convex environments.
Findings
Achieves regret bound of (igl(1 + 1/ auigr) \u221a{ ext{log}(1/ au) d^3 H^4 K})
Guarantees zero safety constraint violation with high probability
Demonstrates effectiveness through autonomous driving simulations
Abstract
In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form. In this paper, we establish a regret bound of , applicable to both star-convex and non-star-convex cases, where is the feature dimension, the episode length, the number of episodes, and the safety threshold. Moreover, the violation of safety constraints is zero with high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
