Bandits for BMO Functions

Tianyu Wang; Cynthia Rudin

arXiv:2007.08703·cs.LG·July 20, 2020·1 cites

Bandits for BMO Functions

Tianyu Wang, Cynthia Rudin

PDF

Open Access 1 Video

TL;DR

This paper introduces a new bandit algorithm tailored for Bounded Mean Oscillation (BMO) functions, which can be discontinuous and unbounded, achieving poly-logarithmic regret relative to near-optimal arms.

Contribution

It develops a novel theoretical framework and algorithm for BMO bandits, extending bandit analysis to more complex, irregular reward functions.

Findings

01

Achieves poly-logarithmic $oldsymbol{ ext{regret}}$ for BMO bandits.

02

Provides a new toolset for analyzing bandits with irregular reward functions.

03

Extends bandit theory to include discontinuous and unbounded reward scenarios.

Abstract

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with infinities in the do-main. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $δ$ -regret -- a regret measured against an arm that is optimal after removing a $δ$ -sized portion of the arm space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bandits for BMO Functions· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research