On Slowly-varying Non-stationary Bandits
Ramakrishnan Krishnamurthy, Aditya Gopalan

TL;DR
This paper introduces a new algorithm for slowly-varying non-stationary bandits, providing the first instance-dependent regret bounds and establishing minimax optimality, advancing understanding of non-stationary bandit problems.
Contribution
It extends the Successive Elimination algorithm to non-stationary settings and provides the first instance-dependent regret bounds and minimax lower bounds for slowly varying bandits.
Findings
First instance-dependent regret upper bound established.
Minimax optimality of the proposed algorithm demonstrated.
Lower bounds match the total variation-budgeted bandits problem.
Abstract
We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected rewards of any arm at any two consecutive time-steps is at most a drift limit . For this setting that has not received enough attention in the past, we give a new algorithm which extends naturally the well-known Successive Elimination algorithm to the non-stationary bandit setting. We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits. The analysis in turn relies on a novel characterization of the instance as a detectable gap profile that depends on the expected arm reward differences. We also provide the first minimax regret lower bound for this problem, enabling us to show that our algorithm is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization
