A Unifying Theory of Thompson Sampling for Continuous Risk-Averse   Bandits

Joel Q. L. Chang; Vincent Y. F. Tan

arXiv:2108.11345·cs.LG·April 19, 2022

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Joel Q. L. Chang, Vincent Y. F. Tan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper develops a unified theoretical framework for risk-averse Thompson sampling algorithms in multi-armed bandits, providing new concentration bounds and proving asymptotic optimality for various risk measures.

Contribution

It introduces a general analytical toolkit for risk functionals, proving asymptotic optimality of risk-averse Thompson sampling algorithms for multiple risk measures.

Findings

01

Asymptotic optimal regret bounds for CVaR and other risk measures.

02

Generalized concentration bounds for continuous and dominant risk functionals.

03

Numerical simulations confirm tight regret bounds.

Abstract

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals $ρ$ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm $ρ$ -MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of $ρ$ -MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joel-ql-chang/continuous-rho-ts
noneOfficial

Videos

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Risk and Portfolio Optimization