Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura; Arnob Ghosh; Tadashi Kozuno; Wataru Kumagai; Kazumi Kasaura; Kenta Hoshino; Yohei Hosoe; Yutaka Matsuo

arXiv:2502.10138·cs.LG·January 29, 2026

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Wataru Kumagai, Kazumi Kasaura, Kenta Hoshino, Yohei Hosoe, Yutaka Matsuo

PDF

Open Access 1 Video

TL;DR

This paper introduces a computationally efficient RL algorithm for linear constrained MDPs that guarantees zero constraint violations per episode and achieves near-optimal regret bounds, advancing theoretical understanding in function approximation settings.

Contribution

It presents the first RL algorithm for linear CMDPs with both zero episode-wise constraint violation and $ ilde{O}( oot{K})$ regret, scalable with problem parameters.

Findings

01

Achieves $ ilde{O}( oot{K})$ regret in linear CMDPs.

02

Guarantees zero episode-wise constraint violation.

03

Scales polynomially with problem parameters, independent of state space size.

Abstract

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves $\tilde{O} (K)$ regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation· slideslive

Taxonomy

TopicsSoftware-Defined Networks and 5G · Elevator Systems and Control · Advanced Optical Network Technologies