Stochastic Linear Bandits with Parameter Noise
Daniel Ezer, Alon Peled-Cohen, Yishay Mansour

TL;DR
This paper analyzes stochastic linear bandits with parameter noise, establishing tight regret bounds that depend on the variance of the reward distribution and the action set, and introduces a simple algorithm to achieve these bounds.
Contribution
The paper derives tight regret bounds for stochastic linear bandits with parameter noise and demonstrates that a simple explore-exploit algorithm attains these bounds.
Findings
Regret upper bound of ( \,d T \, (K/) (_{ ext{max}}))
Lower bound of ( d \, T \, _{ ext{max}}) tight up to logarithmic factors
Minimax regret for p 2 unit balls is ( \, T \, _q)
Abstract
We study the stochastic linear bandits with parameter noise model, in which the reward of action is where is sampled i.i.d. We show a regret upper bound of for a horizon , general action set of size of dimension , and where is the maximal variance of the reward for any action. We further provide a lower bound of which is tight (up to logarithmic factors) whenever . For more specific action sets, unit balls with and dual norm , we show that the minimax regret is , where is a variance-dependent quantity that is always at most . This is in contrast to the minimax regret attainable for such sets in the classic additive noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Stochastic Gradient Optimization Techniques
