Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes

Diego Goldsztajn; Konstantin Avrachenkov

arXiv:2406.04751·math.OC·April 1, 2026

Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes

Diego Goldsztajn, Konstantin Avrachenkov

PDF

TL;DR

This paper develops asymptotically optimal policies for weakly coupled Markov decision processes, generalizing restless bandits, by connecting to a deterministic control problem and providing explicit policy constructions.

Contribution

It introduces a novel approach linking large-scale weakly coupled MDPs to a deterministic control problem, enabling explicit asymptotically optimal policies under broad conditions.

Findings

01

Policies achieve maximum expected average reward as the number of processes grows large.

02

Sufficient conditions for policy optimality are satisfied under common assumptions like unichain and aperiodicity.

03

Numerical experiments confirm the theoretical results in multichain setups.

Abstract

We consider the problem of maximizing the expected average reward obtained over an infinite time horizon by $n$ weakly coupled Markov decision processes. Our setup is a substantial generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We establish a connection with a deterministic and continuous-variable control problem where the objective is to maximize the average reward derived from an occupancy measure that represents the empirical distribution of the processes when $n \to \infty$ . We show that a solution of this fluid problem can be used to construct policies for the weakly coupled processes that achieve the maximum expected average reward as $n \to \infty$ , and we give sufficient conditions for the existence of solutions. Under certain assumptions on the constraints, we prove that these conditions are automatically satisfied if…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.