Stochastic differential equations for limiting description of UCB rule   for Gaussian multi-armed bandits

Sergey Garbar

arXiv:2112.06423·cs.LG·May 12, 2023

Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits

Sergey Garbar

PDF

Open Access

TL;DR

This paper develops a stochastic differential equation framework to describe the limiting behavior of the UCB algorithm in Gaussian multi-armed bandits, validated through Monte Carlo simulations.

Contribution

It introduces a novel stochastic differential equation-based model for the UCB strategy in Gaussian bandits with known horizons, extending understanding of its asymptotic properties.

Findings

01

The model accurately predicts the normalized regret in close reward distributions.

02

Monte Carlo simulations confirm the validity of the stochastic differential equation description.

03

Estimated minimal horizon size for near-optimal normalized regret.

Abstract

We consider the upper confidence bound strategy for Gaussian multi-armed bandits with known control horizon sizes $N$ and build its limiting description with a system of stochastic differential equations and ordinary differential equations. Rewards for the arms are assumed to have unknown expected values and known variances. A set of Monte-Carlo simulations was performed for the case of close distributions of rewards, when mean rewards differ by the magnitude of order $N^{- 1/2}$ , as it yields the highest normalized regret, to verify the validity of the obtained description. The minimal size of the control horizon when the normalized regret is not noticeably larger than maximum possible was estimated.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications · Distributed Sensor Networks and Detection Algorithms