Parameter-Free Dynamic Regret for Unconstrained Linear Bandits
Alberto Rumi, Andrew Jacobsen, Nicol\`o Cesa-Bianchi, Fabio Vitale

TL;DR
This paper introduces a parameter-free algorithm for unconstrained adversarial linear bandits that adaptively minimizes dynamic regret without prior knowledge of comparator switches, achieving optimal bounds.
Contribution
It presents the first algorithm for linear bandits that attains optimal regret bounds of order √d(1+S_T)T without knowing the number of comparator switches in advance.
Findings
Achieves regret of order √d(1+S_T)T up to poly-logarithmic factors.
Provides a simple method to combine guarantees of multiple bandit algorithms.
Resolves a long-standing open problem in adaptive regret minimization.
Abstract
We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators in , but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches of an arbitrary comparator sequence. In particular, we provide the first algorithm for linear bandits achieving the optimal regret guarantee of order up to poly-logarithmic terms without prior knowledge of , thus resolving a long-standing open problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
