Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Kihyun Yu; Seoungbin Bae; Dabeen Lee

arXiv:2603.27884·cs.LG·March 31, 2026

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Kihyun Yu, Seoungbin Bae, Dabeen Lee

PDF

TL;DR

This paper introduces a near-optimal primal-dual algorithm for learning in linear mixture CMDPs with adversarial rewards, achieving provably efficient regret and constraint violation bounds.

Contribution

It presents the first provably efficient algorithm for linear mixture CMDPs with adversarial rewards, with near-optimal regret bounds and novel dual update techniques.

Findings

01

Achieves regret bounds of (\u221a{d^2 H^3 K})

02

First efficient algorithm for linear mixture CMDPs with adversarial rewards

03

Extends weighted ridge regression to constrained setting for tighter confidence intervals

Abstract

We study safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards under full-information feedback and an unknown transition kernel. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $O (d^{2} H^{3} K)$ under mild conditions, where $d$ is the feature dimension, $H$ is the horizon, and $K$ is the number of episodes. To the best of our knowledge, this is the first provably efficient algorithm for linear mixture CMDPs with adversarial rewards. In particular, our regret bound is near-optimal, matching the known minimax lower bound up to logarithmic factors. The key idea is to introduce a regularized dual update that enables a drift-based analysis. This step is essential, as strong duality-based analysis cannot be directly applied when reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.