Algorithms for Linear Bandits on Polyhedral Sets
Manjesh K. Hanawal, Amir Leshem, Venkatesh Saligrama

TL;DR
This paper introduces nearly optimal algorithms for stochastic linear bandit problems within polyhedral sets, achieving logarithmic regret bounds and addressing open questions about efficient exploration strategies.
Contribution
It provides a nearly optimal algorithm with logarithmic regret bounds for linear bandits on polyhedral sets, resolving an open problem and demonstrating robustness to reward perturbations.
Findings
Expected regret scales as (N T)
Proposed algorithms achieve regret of O(N (T))
Algorithms perform well in finite time, matching theoretical bounds asymptotically.
Abstract
We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an -dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for the expected regret that scales as . We then provide a nearly optimal algorithm and show that its expected regret scales as for an arbitrary small . The algorithm alternates between exploration and exploitation intervals sequentially where deterministic set of arms are played in the exploration intervals and greedily selected arm is played in the exploitation intervals. We also develop an algorithm that achieves the optimal regret when sub-Gaussianity parameter of the noise term is known. Our key insight is that for a polyhedron the optimal arm is robust to small perturbations in the reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Advanced Wireless Network Optimization
