Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret

Kihyun Yu; Beomhan Baek; Dabeen Lee

arXiv:2605.11586·cs.LG·May 13, 2026

Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret

Kihyun Yu, Beomhan Baek, Dabeen Lee

PDF

TL;DR

This paper establishes strong duality for weakly communicating average-reward CMDPs and introduces a primal-dual clipped value iteration algorithm that achieves improved regret bounds.

Contribution

It proves strong duality in a challenging setting and develops a novel algorithm with better theoretical guarantees for learning CMDPs.

Findings

01

Strong duality holds for weakly communicating average-reward CMDPs.

02

The proposed algorithm achieves $ ilde{O}(T^{2/3})$ regret and constraint violation bounds.

03

The approach extends clipped value iteration to constrained, weakly communicating settings.

Abstract

We study infinite-horizon average-reward constrained Markov decision processes (CMDPs) under the weakly communicating assumption. Our contributions are twofold. First, we establish strong duality for weakly communicating average-reward CMDPs over stationary policies with finite state and action spaces. Despite the absence of a linear programming formulation and the resulting nonconvexity under the weakly communicating setting, we show that strong duality still holds by carefully exploiting the geometric structure of the occupation measure set. Second, building on this result, we propose a primal--dual clipped value iteration algorithm for learning weakly communicating average-reward linear CMDPs. Our algorithm achieves regret and constraint violation bounds of $O (T^{2/3})$ , improving upon the best known bounds, where $T$ denotes the number of interactions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.