How to gamble with non-stationary $\mathcal{X}$-armed bandits and have   no regrets

Valeriy Avanesov

arXiv:1908.07636·stat.ML·January 19, 2021

How to gamble with non-stationary $\mathcal{X}$-armed bandits and have no regrets

Valeriy Avanesov

PDF

Open Access

TL;DR

This paper introduces a new approach for non-stationary $\

Contribution

It proposes a novel strategy for non-stationary $\

Findings

01

Achieves sub-linear cumulative regret in non-stationary environments.

02

Nearly optimal performance in highly smooth reward settings.

03

Supported by theoretical proofs and experimental validation.

Abstract

In $X$ -armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sub-linear and some times optimal strategies. The given paper introduces a novel variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sub-linear cumulative regret. Moreover, in case of highly smooth relation between an action and the corresponding reward, the method is nearly optimal. The theoretical result are supported by experimental study.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms