Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with   Non-stationary Rewards

Omar Besbes; Yonatan Gur; Assaf Zeevi

arXiv:1405.3316·cs.LG·June 11, 2019

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

Omar Besbes, Yonatan Gur, Assaf Zeevi

PDF

1 Repo

TL;DR

This paper investigates the optimal balance of exploration and exploitation in non-stationary multi-armed bandit problems, providing a theoretical characterization of regret depending on reward variability.

Contribution

It introduces a framework that links reward variation to regret bounds, bridging stochastic and adversarial bandit theories.

Findings

01

Characterizes regret as a function of reward variation.

02

Establishes a connection between stochastic and adversarial bandit frameworks.

03

Provides a mathematical foundation for non-stationary reward analysis.

Abstract

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Chryzanthemum/slots
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.