Continuous Mean-Covariance Bandits
Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

TL;DR
This paper introduces the Continuous Mean-Covariance Bandit model that accounts for option correlations in risk-aware online decision making, proposing optimal algorithms and analyzing their theoretical performance.
Contribution
It is the first to explicitly incorporate option correlation in risk-aware bandits and develop algorithms with provably optimal regret bounds across various feedback settings.
Findings
Proposed algorithms achieve near-optimal regret bounds.
Validated the algorithms' superiority through experiments.
Analyzed the impact of covariance structures on learning performance.
Abstract
Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture different reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with optimal regrets (within logarithmic factors), and provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
