Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Shunta Nonaga; Koji Tabata; Yuta Mizuno; Tamiki Komatsuzaki

arXiv:2506.22253·cs.LG·October 27, 2025

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Shunta Nonaga, Koji Tabata, Yuta Mizuno, Tamiki Komatsuzaki

PDF

Open Access

TL;DR

This paper introduces a new stochastic bandit problem focusing on identifying Pareto-optimal arms that balance reward and risk using the mean-variance criterion, with a unified framework for fixed-confidence and fixed-budget settings.

Contribution

It proposes a novel meta-algorithmic framework for risk-aware bandit optimization that handles both regimes and provides theoretical guarantees and empirical validation.

Findings

01

Outperforms existing methods in accuracy and sample efficiency

02

Effective in synthetic benchmarks for risk-aware decision-making

03

Provides theoretical guarantees on solution correctness

Abstract

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Risk and Portfolio Optimization

MethodsSparse Evolutionary Training · Focus