Restless Bandit Problem with Rewards Generated by a Linear Gaussian   Dynamical System

Jonathan Gornet; Bruno Sinopoli

arXiv:2405.09584·stat.ML·May 24, 2024

Restless Bandit Problem with Rewards Generated by a Linear Gaussian Dynamical System

Jonathan Gornet, Bruno Sinopoli

PDF

Open Access

TL;DR

This paper introduces a novel approach for the restless bandit problem where rewards are generated by a linear Gaussian dynamical system, utilizing a modified Kalman filter for reward prediction to improve decision-making under uncertainty.

Contribution

It proposes a new reward prediction method using a learned Kalman filter that leverages information across actions and time, enhancing bandit algorithms for linear Gaussian systems.

Findings

01

The method outperforms two established bandit algorithms in numerical tests.

02

Reward predictions can be effectively shared across actions and time.

03

The approach demonstrates robustness in various linear Gaussian dynamical systems.

Abstract

Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. In this work, we assume that the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action's next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing

MethodsSparse Evolutionary Training