A Perturbation Approach to Unconstrained Linear Bandits
Andrew Jacobsen, Dorian Baudry, Shinji Ito, Nicol\`o Cesa-Bianchi

TL;DR
This paper revisits perturbation methods in unconstrained bandit linear optimization, showing they reduce the problem to online linear optimization and providing new regret guarantees and lower bounds.
Contribution
It introduces a perturbation scheme that simplifies unconstrained BLO to OLO, with improved regret bounds, high-probability guarantees, and lower bounds analysis.
Findings
Derived expected-regret guarantees with comparator-adaptive algorithms.
Extended analysis to dynamic regret with optimal path-length dependency.
Proved lower bounds and established the $ ext{Omega}(\sqrt{dT})$ regret rate.
Abstract
We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal path-length dependencies without prior knowledge of . We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
