
TL;DR
This paper extends linear bandit models to account for time-dependent parameters with dependencies, proposing an algorithm with sub-linear regret that handles long-range dependencies using coupling techniques.
Contribution
It introduces a generalized restless linear bandit framework with dependencies, and proposes LinMix-UCB, an algorithm with provable regret bounds under exponential mixing conditions.
Findings
Proposed a new restless linear bandit model with dependent parameters.
Developed LinMix-UCB algorithm with sub-linear regret guarantees.
Demonstrated robustness against long-range dependencies using coupling methods.
Abstract
A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown -valued stationary -mixing sequence of parameters which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the -dependence between consecutive . An optimistic algorithm, called LinMix-UCB, is proposed for the case where has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management
