Minimum Empirical Divergence for Sub-Gaussian Linear Bandits
Kapilan Balagopalan, Kwang-Sung Jun

TL;DR
This paper introduces LinMED, a novel linear bandit algorithm that uses minimum empirical divergence, offering a closed-form sampling probability computation and near-optimal regret bounds, with competitive empirical performance.
Contribution
LinMED is the first linear bandit algorithm based on minimum empirical divergence with a closed-form sampling probability, improving off-policy evaluation and theoretical regret bounds.
Findings
Achieves near-optimal regret of $d\,\sqrt{n}$ up to logs.
Provides a problem-dependent regret bound involving $d^2/\Delta$ and logs.
Empirical results show competitive performance with state-of-the-art algorithms.
Abstract
We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of up to logarithmic factors where is the dimension and is the time horizon. We further show that LinMED enjoys a problem-dependent regret where is the smallest sub-optimality gap. Our empirical study shows that LinMED has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Smart Grid Energy Management
