Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits
Marc Abeille, Louis Faury, Cl\'ement Calauz\`enes

TL;DR
This paper introduces a new algorithm for logistic bandits that achieves near-optimal regret bounds by better accounting for non-linearity effects, significantly improving over previous guarantees.
Contribution
The paper presents a novel algorithm with refined analysis for logistic bandits, providing improved problem-dependent regret bounds and establishing minimax optimality.
Findings
Regret scales as (d ext{sqrt}(T/) in favorable cases.
The new bounds outperform previous guarantees by a significant margin.
Identifies two regimes of regret, linking non-linearity effects to exploration-exploitation trade-offs.
Abstract
Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant , characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as , which dramatically improves over the state-of-the-art guarantees. We prove that this rate is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management
