Continuous Mean-Covariance Bandits

Yihan Du; Siwei Wang; Zhixuan Fang; Longbo Huang

arXiv:2102.12090·cs.LG·May 12, 2023·1 cites

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

PDF

Open Access 1 Video

TL;DR

This paper introduces the Continuous Mean-Covariance Bandit model that accounts for option correlations in risk-aware online decision making, proposing optimal algorithms and analyzing their theoretical performance.

Contribution

It is the first to explicitly incorporate option correlation in risk-aware bandits and develop algorithms with provably optimal regret bounds across various feedback settings.

Findings

01

Proposed algorithms achieve near-optimal regret bounds.

02

Validated the algorithms' superiority through experiments.

03

Analyzed the impact of covariance structures on learning performance.

Abstract

Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture different reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with optimal regrets (within logarithmic factors), and provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Continuous Mean-Covariance Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management