Restless dependent bandits with fading memory

Oleksandr Zadorozhnyi; Gilles Blanchard; Alexandra Carpentier

arXiv:1906.10454·stat.ML·June 26, 2019

Restless dependent bandits with fading memory

Oleksandr Zadorozhnyi, Gilles Blanchard, Alexandra Carpentier

PDF

Open Access

TL;DR

This paper extends multi-armed bandit algorithms to dependent data generated by weak mixing processes, showing that regret bounds are similar to the i.i.d. case under certain mixing conditions.

Contribution

It introduces a -Mix Improved UCB algorithm and analyzes its regret in dependent settings, revealing surprising results in slow-mixing scenarios.

Findings

01

Regret bounds similar to i.i.d. case in fast-mixing scenarios

02

Additive regret term in slow-mixing scenarios independent of number of arms

03

Lower bounds matching upper bounds up to a log(T) factor

Abstract

We study the stochastic multi-armed bandit problem in the case when the arm samples are dependent over time and generated from so-called weak $\cC$ -mixing processes. We establish a $\cC -$ Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing scenario, we show that pseudo-regret enjoys the same upper bound (up to a factor) as for i.i.d. observations; whereas in the second, slow mixing scenario, we discover a surprising effect, that the regret upper bound is similar to the independent case, with an incremental {\em additive} term which does not depend on the number of arms. The analysis of slow mixing scenario is supported with a minmax lower bound, which (up to a $lo g (T)$ factor) matches the obtained upper bound.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Game Theory and Applications