Self-Concordant Perturbations for Linear Bandits
Lucas L\'evy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini

TL;DR
This paper introduces a unified framework for adversarial linear bandits using self-concordant perturbations, leading to a new algorithm that improves regret bounds, especially in high-dimensional settings.
Contribution
It extends the connection between FTRL and FTPL methods with self-concordant perturbations, resulting in a novel algorithm with improved regret bounds in linear bandit problems.
Findings
Achieves regret of O(d√(n log n)) on hypercube and ℓ₂ ball.
Matches SCRiBLe rate on ℓ₂ ball.
Improves regret bounds by √d on the hypercube.
Abstract
We consider the adversarial linear bandits setting and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of on both the -dimensional hypercube and the ball. On the ball, this matches the rate attained by SCRiBLe. For the hypercube, this represents a improvement over these methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
