Near-Optimal Sample Complexity for Online Constrained MDPs

Chang Liu; Yunfan Li; Lin F. Yang

arXiv:2602.15076·cs.LG·February 18, 2026

Near-Optimal Sample Complexity for Online Constrained MDPs

Chang Liu, Yunfan Li, Lin F. Yang

PDF

Open Access

TL;DR

This paper introduces a model-based primal-dual algorithm for online constrained MDPs that achieves near-optimal sample complexity, effectively balancing safety constraints and learning efficiency in both relaxed and strict feasibility settings.

Contribution

It provides the first near-optimal sample complexity bounds for online constrained MDPs, matching lower bounds and handling both relaxed and strict safety constraints.

Findings

01

Achieves $ ilde{O}(SAH^3/ ext{epsilon}^2)$ episodes for relaxed feasibility.

02

Achieves $ ilde{O}(SAH^5/( ext{epsilon}^2 ext{zeta}^2))$ episodes for strict feasibility.

03

Demonstrates learning CMDPs as efficiently as unconstrained MDPs when violations are small.

Abstract

Safety is a fundamental challenge in reinforcement learning (RL), particularly in real-world applications such as autonomous driving, robotics, and healthcare. To address this, Constrained Markov Decision Processes (CMDPs) are commonly used to enforce safety constraints while optimizing performance. However, existing methods often suffer from significant safety violations or require a high sample complexity to generate near-optimal policies. We address two settings: relaxed feasibility, where small violations are allowed, and strict feasibility, where no violation is allowed. We propose a model-based primal-dual algorithm that balances regret and bounded constraint violations, drawing on techniques from online RL and constrained optimization. For relaxed feasibility, we prove that our algorithm returns an $ε$ -optimal policy with $ε$ -bounded violation with arbitrarily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research