Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model
Xingtu Liu, Lin F. Yang, Sharan Vaswani

TL;DR
This paper develops sample complexity bounds for solving linear constrained MDPs with a generative model, providing near-optimal guarantees for both relaxed and strict feasibility cases using a primal-dual approach.
Contribution
The paper introduces a primal-dual framework leveraging any black-box unconstrained MDP solver for linear CMDPs and derives near-optimal sample complexity bounds for different feasibility settings.
Findings
Sample complexity for relaxed feasibility: rac{d^2}{(1-\u03b3)^4\u03b5^2} samples.
Sample complexity for strict feasibility: rac{d^2}{(1-b3)^6b5^2b0^2} samples.
Lower bound matches upper bounds up to logarithmic factors.
Abstract
We consider infinite-horizon -discounted (linear) constrained Markov decision processes (CMDPs) where the objective is to find a policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Given access to a generative model, we propose to solve CMDPs with a primal-dual framework that can leverage any black-box unconstrained MDP solver. For linear CMDPs with feature dimension , we instantiate the framework by using mirror descent value iteration (\texttt{MDVI})~\citep{kitamura2023regularization} an example MDP solver. We provide sample complexity bounds for the resulting CMDP algorithm in two cases: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to exactly satisfy the constraint. For (i), we prove that the algorithm can return an -optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
