Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions
Seoungbin Bae, Dabeen Lee

TL;DR
This paper introduces SupSplitLog, an algorithm for logistic bandits that achieves near-optimal regret without relying on strong context diversity assumptions, improving upon existing methods.
Contribution
SupSplitLog is the first logistic bandit algorithm to attain $ ilde{O}( oot{2}dT)$ regret without context diversity assumptions, using a novel sample-splitting approach.
Findings
SupSplitLog achieves $ ilde{O}( oot{2}dT)$ regret without context diversity.
The algorithm improves dependence on dimension $d$ in the regret bound.
Experimental results confirm the theoretical advantages of SupSplitLog.
Abstract
We study the -armed logistic bandit problem, where at each round, the agent observes feature vectors associated with actions. Existing approaches that achieve a rate-optimal regret bound rely heavily on context diversity assumptions, such as strict positivity of the minimum eigenvalue of a context covariance matrix. These assumptions, however, impose strong restrictions on the context process, as they rule out the situation where the context vectors are concentrated in a low-dimensional subspace. In this paper, we propose SupSplitLog, which, to the best of our knowledge, is the first algorithm for logistic bandits that achieves regret without any context diversity assumption. The key idea is to split the collected samples into two disjoint subsets when constructing estimators; one is used to compute an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
