A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
Joel Q. L. Chang, Vincent Y. F. Tan

TL;DR
This paper develops a unified theoretical framework for risk-averse Thompson sampling algorithms in multi-armed bandits, providing new concentration bounds and proving asymptotic optimality for various risk measures.
Contribution
It introduces a general analytical toolkit for risk functionals, proving asymptotic optimality of risk-averse Thompson sampling algorithms for multiple risk measures.
Findings
Asymptotic optimal regret bounds for CVaR and other risk measures.
Generalized concentration bounds for continuous and dominant risk functionals.
Numerical simulations confirm tight regret bounds.
Abstract
This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm -MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of -MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Risk and Portfolio Optimization
