Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Kapilan Balagopalan; Kwang-Sung Jun

arXiv:2411.00229·stat.ML·March 12, 2025

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Kapilan Balagopalan, Kwang-Sung Jun

PDF

Open Access 1 Repo

TL;DR

This paper introduces LinMED, a novel linear bandit algorithm that uses minimum empirical divergence, offering a closed-form sampling probability computation and near-optimal regret bounds, with competitive empirical performance.

Contribution

LinMED is the first linear bandit algorithm based on minimum empirical divergence with a closed-form sampling probability, improving off-policy evaluation and theoretical regret bounds.

Findings

01

Achieves near-optimal regret of $d\,\sqrt{n}$ up to logs.

02

Provides a problem-dependent regret bound involving $d^2/\Delta$ and logs.

03

Empirical results show competitive performance with state-of-the-art algorithms.

Abstract

We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits. LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling. Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability. We prove that LinMED enjoys a near-optimal regret bound of $d n$ up to logarithmic factors where $d$ is the dimension and $n$ is the time horizon. We further show that LinMED enjoys a $\frac{d ^{2}}{Δ} (lo g^{2} (n)) lo g (lo g (n))$ problem-dependent regret where $Δ$ is the smallest sub-optimality gap. Our empirical study shows that LinMED has a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kapilan-Balagopalan/Linear-Bandit-Algorithms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Smart Grid Energy Management