Stochastic Bandit Based on Empirical Moments

Junya Honda; Akimichi Takemura

arXiv:1105.2879·math.ST·March 29, 2013·AISTATS·1 cites

Stochastic Bandit Based on Empirical Moments

Junya Honda, Akimichi Takemura

PDF

Open Access

TL;DR

This paper introduces a generalized stochastic bandit policy that leverages empirical moments up to a fixed order to optimize the exploration-exploitation tradeoff, approaching theoretical regret bounds with adjustable complexity.

Contribution

It extends existing variance-based policies to use higher-order empirical moments, balancing computational complexity and regret minimization.

Findings

01

Asymptotic regret approaches theoretical bounds with increasing moments d.

02

Policy effectively balances computational complexity and regret by choosing d.

03

Generalizes variance-based methods to higher moments for improved performance.

Abstract

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0,1]. For this model, policies which take into account the empirical variances (i.e. second moments) of the arms are known to perform effectively. In this paper, we generalize this idea and we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. The asymptotic upper bound of the regret of the policy approaches the theoretical bound by Burnetas and Katehakis as d increases. By choosing appropriate d, the proposed policy realizes a tradeoff between the computational complexity and the expected regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics