An Improved Parametrization and Analysis of the EXP3++ Algorithm for   Stochastic and Adversarial Bandits

Yevgeny Seldin; G\'abor Lugosi

arXiv:1702.06103·cs.LG·May 10, 2017·28 cites

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin, G\'abor Lugosi

PDF

Open Access

TL;DR

This paper introduces an improved version of the EXP3++ algorithm for multi-armed bandits, reducing regret dependence on time horizon in stochastic settings while maintaining adversarial guarantees.

Contribution

It proposes a new gap estimation strategy that enhances the EXP3++ algorithm, improving regret bounds in stochastic regimes without affecting adversarial performance.

Findings

01

Reduced regret dependence from $( ln t)^3$ to $( ln t)^2$ in stochastic settings

02

Eliminated additive regret factor related to the minimal gap $\\Delta$

03

Maintained existing regret guarantees in adversarial regimes

Abstract

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^{3}$ to $(ln t)^{2}$ and eliminates an additive factor of order $Δ e^{1/ Δ^{2}}$ , where $Δ$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems