MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment

Mohsen Amiri; Konstantin Avrachenkov; Ibtihal El Mimouni; Sindri Magn\'usson

arXiv:2511.09324·cs.LG·April 13, 2026

MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment

Mohsen Amiri, Konstantin Avrachenkov, Ibtihal El Mimouni, Sindri Magn\'usson

PDF

TL;DR

This paper introduces MARBLE, a model for restless bandits with latent Markovian environments, and proves convergence of a new learning algorithm under nonstationary conditions, validated on a recommender system simulator.

Contribution

It proposes MARBLE, a novel extension of RMABs with latent states, and establishes the convergence of Q-learning with Whittle Indices under relaxed indexability assumptions.

Findings

01

QWI adapts effectively to shifting latent states.

02

QWI converges to optimal policies in nonstationary environments.

03

MARBLE's approach is validated on a digital twin recommender system.

Abstract

Restless Multi-Armed Bandits (RMABs) are powerful models for decision-making under uncertainty, yet classical formulations typically assume fixed dynamics, an assumption often violated in nonstationary environments. We introduce MARBLE (Multi-Armed Restless Bandits in a Latent Markovian Environment), which augments RMABs with a latent Markov state that induces nonstationary behavior. In MARBLE, each arm evolves according to a latent environment state that switches over time, making policy learning substantially more challenging. We further introduce the Markov-Averaged Indexability (MAI) criterion as a relaxed indexability assumption and prove that, despite unobserved regime switches, under the MAI criterion, synchronous Q-learning with Whittle Indices (QWI) converges almost surely to the optimal Q-function and the corresponding Whittle indices. We validate MARBLE on a calibrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.