Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of   Non-Stationary Bandit Problems

Rahul Singh; Taposh Banerjee

arXiv:1702.08000·stat.ML·March 9, 2017·1 cites

Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

Rahul Singh, Taposh Banerjee

PDF

Open Access

TL;DR

This paper demonstrates that the Kiefer Wolfowitz algorithm, with appropriate modifications, achieves asymptotic optimality in non-stationary bandit problems where the reward functions change slowly over time.

Contribution

It introduces and analyzes two variants of the Kiefer Wolfowitz algorithm for non-stationary bandits, proving their asymptotic efficiency under certain conditions.

Findings

01

Regret of the algorithms is o(T) when the number of function variations is o(T).

02

Optimal learning rates lead to asymptotic efficiency.

03

Algorithms adapt effectively to slowly changing reward functions.

Abstract

We consider the problem of designing an allocation rule or an "online learning algorithm" for a class of bandit problems in which the set of control actions available at each time $s$ is a convex, compact subset of $R^{d}$ . Upon choosing an action $x$ at time $s$ , the algorithm obtains a noisy value of the unknown and time-varying function $f_{s}$ evaluated at $x$ . The "regret" of an algorithm is the gap between its expected reward, and the reward earned by a strategy which has the knowledge of the function $f_{s}$ at each time $s$ and hence chooses the action $x_{s}$ that maximizes $f_{s}$ . For this non-stationary bandit problem set-up, we consider two variants of the Kiefer Wolfowitz (KW) algorithm i) KW with fixed step-size $β$ , and ii) KW with sliding window of length $L$ . We show that if the number of times that the function $f_{s}$ varies during time $T$ is $o (T)$ , and if the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems