Norm-Agnostic Linear Bandits
Spencer (Brady) Gales, Sunder Sethuraman, Kwang-Sung Jun

TL;DR
This paper introduces new linear bandit algorithms that do not require prior knowledge of the parameter norm bound, maintaining low regret even when such bounds are unknown or incorrect.
Contribution
The paper presents the first algorithms for linear bandits that operate effectively without knowing the norm bound of the unknown parameter, with proven regret bounds.
Findings
Algorithms achieve low regret without prior norm knowledge.
Regret bounds are unaffected by the lack of norm bound knowledge.
Standard algorithms can fail catastrophically when the norm bound assumption is violated.
Abstract
Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound on the norm of the unknown parameter that governs the reward generation. Such an assumption forces the practitioner to guess involved in the confidence bound, leaving no choice but to wish that is true to guarantee that the regret will be low. In this paper, we propose novel algorithms that do not require such knowledge for the first time. Specifically, we propose two algorithms and analyze their regret bounds: one for the changing arm set setting and the other for the fixed arm set setting. Our regret bound for the former shows that the price of not knowing does not affect the leading term in the regret bound and inflates only the lower order term. For the latter, we do not pay any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
