Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic   Optimality

Kwang-Sung Jun; Chicheng Zhang

arXiv:2006.08754·cs.LG·October 26, 2020·5 cites

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

Kwang-Sung Jun, Chicheng Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces CROP, a novel algorithm for structured bandits that achieves asymptotic optimality and adapts to bounded regret, overcoming limitations of optimistic algorithms and improving performance in finite-time regimes.

Contribution

The paper proposes CROP, the first algorithm to attain asymptotic optimality while adapting to bounded regret in structured bandits, eliminating optimistic hypotheses with a pessimistic approach.

Findings

01

CROP achieves constant-factor asymptotic optimality.

02

CROP adapts to bounded regret, scaling with an effective number of arms.

03

CROP can outperform existing algorithms exponentially in certain nonasymptotic regimes.

Abstract

We study stochastic structured bandits for minimizing regret. The fact that the popular optimistic algorithms do not achieve the asymptotic instance-dependent regret optimality (asymptotic optimality for short) has recently alluded researchers. On the other hand, it is known that one can achieve bounded regret (i.e., does not grow indefinitely with $n$ ) in certain instances. Unfortunately, existing asymptotically optimal algorithms rely on forced sampling that introduces an $ω (1)$ term w.r.t. the time horizon $n$ in their regret, failing to adapt to the "easiness" of the instance. In this paper, we focus on the finite hypothesis case and ask if one can achieve the asymptotic optimality while enjoying bounded regret whenever possible. We provide a positive answer by introducing a new algorithm called CRush Optimism with Pessimism (CROP) that eliminates optimistic hypotheses by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics