Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses

Kihyun Yu; Seoungbin Bae; Dabeen Lee

arXiv:2605.11535·cs.LG·May 13, 2026

Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses

Kihyun Yu, Seoungbin Bae, Dabeen Lee

PDF

1 Video

TL;DR

This paper introduces a primal-dual policy optimization algorithm for adversarial linear CMDPs, achieving sublinear regret and constraint violation bounds in an online setting with adversarial losses.

Contribution

It presents the first algorithm with sublinear regret and violation bounds for adversarial linear CMDPs, using weighted LogSumExp softmax policies and novel analysis techniques.

Findings

01

Achieves $ ilde{O}(K^{3/4})$ regret and violation bounds.

02

Introduces weighted LogSumExp softmax policies for adversarial environments.

03

Validates theoretical results with numerical experiments.

Abstract

Existing work on linear constrained Markov decision processes (CMDPs) has primarily focused on stochastic settings, where the losses and costs are either fixed or drawn from fixed distributions. However, such formulations are inherently vulnerable to adversarially changing environments. To overcome this limitation, we propose a primal-dual policy optimization algorithm for online finite-horizon {adversarial} linear CMDPs, where the losses are adversarially chosen under full-information feedback and the costs are stochastic under bandit feedback. Our algorithm is the \emph{first} to achieve sublinear regret and constraint violation bounds in this setting, both bounded by $O (K^{3/4})$ , where $K$ denotes the number of episodes. The algorithm introduces and runs with a new class of policies, which we call weighted LogSumExp softmax policies, designed to adapt to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses· slideslive