Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Seoungbin Bae; Dabeen Lee

arXiv:2604.22161·cs.LG·May 1, 2026

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Seoungbin Bae, Dabeen Lee

PDF

TL;DR

This paper introduces SupSplitLog, an algorithm for logistic bandits that achieves near-optimal regret without relying on strong context diversity assumptions, improving upon existing methods.

Contribution

SupSplitLog is the first logistic bandit algorithm to attain $ ilde{O}( oot{2}dT)$ regret without context diversity assumptions, using a novel sample-splitting approach.

Findings

01

SupSplitLog achieves $ ilde{O}( oot{2}dT)$ regret without context diversity.

02

The algorithm improves dependence on dimension $d$ in the regret bound.

03

Experimental results confirm the theoretical advantages of SupSplitLog.

Abstract

We study the $K$ -armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{O} (d T)$ regret bound rely heavily on context diversity assumptions, such as strict positivity of the minimum eigenvalue of a context covariance matrix. These assumptions, however, impose strong restrictions on the context process, as they rule out the situation where the context vectors are concentrated in a low-dimensional subspace. In this paper, we propose SupSplitLog, which, to the best of our knowledge, is the first algorithm for logistic bandits that achieves $\tilde{O} (d T)$ regret without any context diversity assumption. The key idea is to split the collected samples into two disjoint subsets when constructing estimators; one is used to compute an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.