Generalized Regret Analysis of Thompson Sampling using Fractional   Posteriors

Prateek Jaiswal; Debdeep Pati; Anirban Bhattacharya; Bani K. Mallick

arXiv:2309.06349·stat.ML·September 13, 2023

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick

PDF

Open Access

TL;DR

This paper introduces a fractional posterior variant of Thompson sampling, called $oldsymbol{ ext{ extalpha}- ext{TS}}$, providing new regret bounds for stochastic multi-armed bandits under mild conditions, expanding theoretical understanding of Bayesian algorithms.

Contribution

It develops a generalized regret analysis for $ ext{ extalpha}- ext{TS}$ using fractional posteriors, with bounds applicable to broad reward models and priors, without requiring conjugacy.

Findings

01

Achieves both instance-dependent and independent regret bounds.

02

Matches the regret bounds of improved UCB algorithms.

03

Applicable to sub-Gaussian and exponential family reward models.

Abstract

Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$ -TS, where we use a fractional or $α$ -posterior ( $α \in (0, 1)$ ) instead of the standard posterior distribution. To compute an $α$ -posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$ . For $α$ -TS we obtain both instance-dependent $O (\sum_{k \neq = i^{*}} Δ_{k} (\frac{l o g ( T )}{C ( α ) Δ _{k}^{2}} + \frac{1}{2}))$ and instance-independent $O (K T lo g K)$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $Δ_{k}$ is the gap between the true mean rewards of the $k^{t h}$ and the best arms, and $C (α)$ is a known constant. Both the sub-Gaussian and exponential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms

MethodsSpatio-temporal stability analysis