Order Optimal Regret Bounds for Sharpe Ratio Optimization under Thompson Sampling

Mohammad Taha Shah; Sabrina Khurshid; Gourab Ghatak

arXiv:2508.13749·cs.LG·April 2, 2026

Order Optimal Regret Bounds for Sharpe Ratio Optimization under Thompson Sampling

Mohammad Taha Shah, Sabrina Khurshid, Gourab Ghatak

PDF

TL;DR

This paper introduces a Bayesian Thompson Sampling algorithm for optimizing the Sharpe ratio in multi-armed bandits, achieving order-optimal regret bounds and demonstrating superior performance over existing methods.

Contribution

The paper develops SRTS, a risk-aware Thompson Sampling algorithm with a unified approach for different risk regimes, and provides theoretical guarantees of its optimality.

Findings

01

Achieves an $ ilde{O}( ext{log } n)$ regret bound for SR optimization.

02

Provides a matching lower bound, establishing order-optimality.

03

Shows improved empirical performance over existing risk-aware bandit algorithms.

Abstract

In this paper, we study sequential decision-making for maximizing the Sharpe ratio (SR) in a stochastic multi-armed bandit (MAB) setting. Unlike standard bandit formulations that maximize cumulative reward, SR optimization requires balancing expected return and reward variability. As a result, the learning objective depends jointly on the mean and variance of the reward distribution and takes a fractional form. To address this problem, we propose the Sharpe Ratio Thompson Sampling \texttt{SRTS}, a Bayesian algorithm for risk-adjusted exploration. For Gaussian reward models, the algorithm employs a Normal-Gamma conjugate posterior to capture uncertainty in both the mean and the precision of each arm. In contrast to additive mean-variance (MV) formulations, which often require different algorithms across risk regimes, the fractional SR objective yields a single sampling rule that applies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.