Efficient Simple Regret Algorithms for Stochastic Contextual Bandits
Shuai Liu, Alireza Bakhtiari, Alex Ayoub, Botao Hao, Csaba Szepesv\'ari

TL;DR
This paper introduces the first algorithms with provable simple regret guarantees for stochastic contextual logistic bandits, extending linear bandit results and providing practical, tractable solutions with empirical validation.
Contribution
It proposes novel algorithms achieving the first simple regret bounds for logistic bandits, including a new Thompson Sampling variant, with bounds independent of the unknown parameter magnitude.
Findings
Achieves simple regret $ ilde{O}(d/\sqrt{T})$ for logistic bandits.
Introduces a Thompson Sampling algorithm with regret $ ilde{O}(d^{3/2}/\sqrt{T})$.
Empirically validates theoretical guarantees through experiments.
Abstract
We study stochastic contextual logistic bandits under the simple regret objective. While simple regret guarantees have been established for the linear case, no such results were previously known for the logistic setting. Building on ideas from contextual linear bandits and self-concordant analysis, we propose the first algorithm that achieves simple regret . Notably, the leading term of our regret bound is free of the constant , where is a bound on the magnitude of the unknown parameter vector. The algorithm is shown to be fully tractable when the action set is finite. We also introduce a new variant of Thompson Sampling tailored to the simple-regret setting. This yields the first simple regret guarantee for randomized algorithms in stochastic contextual linear bandits, with regret…
Peer Reviews
Decision·Submitted to ICLR 2026
- The authors propose an effective algorithm for simple-regret minimization in stochastic contextual bandits. The regret guarantees are reasonable, and the authors provide sufficient explanations for their derivations. - In particular, the finite-sample analysis is strong.
1. In my understanding, several other studies address simple-regret minimization in stochastic contextual bandits. For example, Kato et al. (2024) develop policy-learning algorithms in this setting. Their goal is to train a policy that minimizes simple regret in a best-arm-identification setting, and they characterize regret bounds using the VC dimension, which covers certain linear and logistic models. Theoretically, that analysis may be somewhat coarse, but could those results be applied to th
The logistic simple-regret setting is well motivated, and the work fills a clear gap in the literature. To my knowledge, this is the first paper to remove the dependence on the curvature constant $\kappa$ from the leading term of the regret bound. The construction of a monotone surrogate Hessian and the associated decreasing-uncertainty lemma are non-trivial and address the main technical challenge in logistic models, where the uncertainty depends on the unknown slope $\mu'(z)$. These ideas are
Although there are no fatal theoretical flaws, the paper contains numerous typographical and consistency issues that make verification difficult. The most important ones are: - In both MULIN and SIMPLELINTS, the design matrix $V_{t+1}$ is never updated. The pseudocode should include $V_{t+1} \leftarrow V_t + \phi(S_t, A_t)\phi(S_t, A_t)^\top.$ - From Equation (17), we have $\mathcal{V}_{t+1} \subseteq \mathcal{V}_t$, but it is reversed in Rows 1420-1421. - The term $\varphi(s,a)^\top \the
1. The methods and analysis are unified in the sense that we can understand the intuitive and important theoretical property in the linear bandits and then the natural extension to the logistic model is described. The paper is effortless to follow, and the text is flawless. 2. SIMPLELINTS (Theorem 3) achieves $\tilde{O}(d^{3/2}/\sqrt{T})$. Based on it, they also analyze a randomized logistic algorithm based on TS. These randomized methods have computational advantages over the deterministic met
The motivation to study the simple regret in practice is not discussed.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Stochastic Gradient Optimization Techniques
