Rotting Bandits
Nir Levine, Koby Crammer, Shie Mannor

TL;DR
This paper introduces Rotting Bandits, a variant of the Multi-Armed Bandits problem where each arm's reward decays with use, providing algorithms and theoretical analysis for this non-stationary setting relevant to real-world applications.
Contribution
The paper formulates the Rotting Bandits problem, proposes algorithms to address reward decay, and offers theoretical guarantees and simulations demonstrating effectiveness.
Findings
Algorithms effectively handle reward decay in Rotting Bandits.
Theoretical guarantees establish performance bounds.
Simulations validate the proposed methods.
Abstract
The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms' rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm's expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing
