Stochastic Linear Bandits with Parameter Noise

Daniel Ezer; Alon Peled-Cohen; Yishay Mansour

arXiv:2601.23164·cs.LG·February 2, 2026

Stochastic Linear Bandits with Parameter Noise

Daniel Ezer, Alon Peled-Cohen, Yishay Mansour

PDF

Open Access

TL;DR

This paper analyzes stochastic linear bandits with parameter noise, establishing tight regret bounds that depend on the variance of the reward distribution and the action set, and introduces a simple algorithm to achieve these bounds.

Contribution

The paper derives tight regret bounds for stochastic linear bandits with parameter noise and demonstrates that a simple explore-exploit algorithm attains these bounds.

Findings

01

Regret upper bound of ( \,d T \, (K/) (_{ ext{max}}))

02

Lower bound of ( d \, T \, _{ ext{max}}) tight up to logarithmic factors

03

Minimax regret for p 2 unit balls is ( \, T \, _q)

Abstract

We study the stochastic linear bandits with parameter noise model, in which the reward of action $a$ is $a^{⊤} θ$ where $θ$ is sampled i.i.d. We show a regret upper bound of $O (d T lo g (K / δ) σ_{m a x}^{2})$ for a horizon $T$ , general action set of size $K$ of dimension $d$ , and where $σ_{m a x}^{2}$ is the maximal variance of the reward for any action. We further provide a lower bound of $Ω (d T σ_{m a x}^{2})$ which is tight (up to logarithmic factors) whenever $lo g (K) \approx d$ . For more specific action sets, $ℓ_{p}$ unit balls with $p \leq 2$ and dual norm $q$ , we show that the minimax regret is $Θ (d T σ_{q}^{2})$ , where $σ_{q}^{2}$ is a variance-dependent quantity that is always at most $4$ . This is in contrast to the minimax regret attainable for such sets in the classic additive noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Stochastic Gradient Optimization Techniques