Rising Rested Bandits: Lower Bounds and Efficient Algorithms
Marco Fiandri, Alberto Maria Metelli, Francesco Trov`o

TL;DR
This paper investigates the sample complexity of a specific class of rested multi-armed bandits with non-decreasing, concave reward functions, proposing lower bounds and an efficient algorithm with competitive regret bounds.
Contribution
It introduces the R-ed-UCB algorithm for this class of bandits, providing regret bounds and empirical comparisons with existing methods.
Findings
Derived regret lower bounds for the class of monotonic, concave reward functions.
Proposed R-ed-UCB algorithm with regret bounds of order $ ilde{O}(T^{2/3})$ under certain conditions.
Empirical results show competitive performance against state-of-the-art methods.
Abstract
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. ). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case , providing a regret bound depending on the properties of the instance and, under certain circumstances, of . We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Advanced Wireless Network Optimization
