Non-Stationary Bandits under Recharging Payoffs: Improved Planning with   Sublinear Regret

Orestis Papadigenopoulos; Constantine Caramanis; Sanjay Shakkottai

arXiv:2205.14790·cs.LG·October 13, 2022·1 cites

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

PDF

Open Access 1 Video

TL;DR

This paper introduces an improved polynomial-time approximation algorithm for planning in non-stationary multi-armed bandits with recharging payoffs, achieving sublinear regret in the unknown payoff setting.

Contribution

It develops a $(1 - 1/e)$-approximation algorithm for known payoff functions and extends it to the bandit setting with sublinear regret, improving prior guarantees.

Findings

01

Achieves a $(1 - 1/e)$-approximation for the planning problem.

02

Provides a bandit algorithm with sublinear regret.

03

Improves upon the previous 1/4-approximation guarantees.

Abstract

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more. Even assuming prior knowledge of the mean payoff functions, computing an optimal planning in the above model is NP-hard, while the state-of-the-art is a $1/4$ -approximation algorithm for the case where at most one arm can be played per round. We first focus on the setting where the mean payoff functions are known. In this setting, we significantly improve the best-known guarantees for the planning problem by developing a polynomial-time $(1 - 1 / e)$ -approximation algorithm (asymptotically and in expectation),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Age of Information Optimization