Multi-Action Restless Bandits with Weakly Coupled Constraints:   Simultaneous Learning and Control

Jing Fu; Bill Moran; Jos\'e Ni\~no-Mora

arXiv:2412.03326·math.OC·December 5, 2024

Multi-Action Restless Bandits with Weakly Coupled Constraints: Simultaneous Learning and Control

Jing Fu, Bill Moran, Jos\'e Ni\~no-Mora

PDF

Open Access

TL;DR

This paper addresses the challenge of online learning and control in multi-action restless bandit systems with weakly coupled constraints, proposing algorithms with proven exponential convergence to optimality.

Contribution

It introduces a novel online scheme for simultaneous learning and control in complex multi-action bandit systems with weakly coupled constraints, with proven convergence properties.

Findings

01

Proposed algorithms converge exponentially fast in the number of bandits.

02

The scheme achieves performance close to offline optimal solutions.

03

Convergence is established both in time and in the number of processes.

Abstract

We study a system with finitely many groups of multi-action bandit processes, each of which is a Markov decision process (MDP) with finite state and action spaces and potentially different transition matrices when taking different actions. The bandit processes of the same group share the same state and action spaces and, given the same action that is taken, the same transition matrix. All the bandit processes across various groups are subject to multiple weakly coupled constraints over their state and action variables. Unlike the past studies that focused on the offline case, we consider the online case without assuming full knowledge of transition matrices and reward functions a priori and propose an effective scheme that enables simultaneous learning and control. We prove the convergence of the relevant processes in both the timeline and the number of the bandit processes, referred to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mind wandering and attention · Smart Grid Energy Management