Restless Bandit Problem with Rewards Generated by a Linear Gaussian Dynamical System
Jonathan Gornet, Bruno Sinopoli

TL;DR
This paper introduces a novel approach for the restless bandit problem where rewards are generated by a linear Gaussian dynamical system, utilizing a modified Kalman filter for reward prediction to improve decision-making under uncertainty.
Contribution
It proposes a new reward prediction method using a learned Kalman filter that leverages information across actions and time, enhancing bandit algorithms for linear Gaussian systems.
Findings
The method outperforms two established bandit algorithms in numerical tests.
Reward predictions can be effectively shared across actions and time.
The approach demonstrates robustness in various linear Gaussian dynamical systems.
Abstract
Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. In this work, we assume that the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action's next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing
MethodsSparse Evolutionary Training
