Rising Rested Bandits: Lower Bounds and Efficient Algorithms

Marco Fiandri; Alberto Maria Metelli; Francesco Trov`o

arXiv:2411.14446·stat.ML·November 28, 2024

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

Marco Fiandri, Alberto Maria Metelli, Francesco Trov`o

PDF

Open Access

TL;DR

This paper investigates the sample complexity of a specific class of rested multi-armed bandits with non-decreasing, concave reward functions, proposing lower bounds and an efficient algorithm with competitive regret bounds.

Contribution

It introduces the R-ed-UCB algorithm for this class of bandits, providing regret bounds and empirical comparisons with existing methods.

Findings

01

Derived regret lower bounds for the class of monotonic, concave reward functions.

02

Proposed R-ed-UCB algorithm with regret bounds of order $ ilde{O}(T^{2/3})$ under certain conditions.

03

Empirical results show competitive performance against state-of-the-art methods.

Abstract

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $a r m$ ). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case $R-ed-UCB$ , providing a regret bound depending on the properties of the instance and, under certain circumstances, of $O (T^{\frac{2}{3}})$ . We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Advanced Wireless Network Optimization