Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Yifan Lin; Yuhao Wang; Enlu Zhou

arXiv:2206.12463·cs.LG·June 28, 2022

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Yifan Lin, Yuhao Wang, Enlu Zhou

PDF

Open Access

TL;DR

This paper studies a risk-averse version of the contextual multi-armed bandit problem with linear payoffs, proposing algorithms with regret bounds and demonstrating their effectiveness in portfolio selection.

Contribution

It introduces a risk-averse framework using mean-variance for contextual bandits and provides regret analysis for a Thompson Sampling-based algorithm.

Findings

01

Regret bound of $O((1+ ho+rac{1}{ ho}) d ext{ln} T ext{ln} rac{K}{ ext{delta}} ext{sqrt}{d K T^{1+2 extpsilon} ext{ln} rac{K}{ ext{delta}} rac{1}{ extvarepsilon}})$ with high probability.

02

Empirical results demonstrate the algorithm's effectiveness in portfolio selection.

03

The approach effectively balances risk and reward in sequential decision-making.

Abstract

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For $T$ rounds, $K$ actions, and $d$ -dimensional feature vectors, we prove a regret bound of $O ((1 + ρ + \frac{1}{ρ}) d ln T ln \frac{K}{δ} d K T^{1 + 2 ϵ} ln \frac{K}{δ} \frac{1}{ϵ})$ that holds with probability $1 - δ$ under the mean-variance criterion with risk tolerance $ρ$ , for any $0 < ϵ < \frac{1}{2}$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Optimization and Search Problems