Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm

Mansoor Davoodi; Setareh Maghsudi

arXiv:2506.13125·cs.LG·June 17, 2025

Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm

Mansoor Davoodi, Setareh Maghsudi

PDF

Open Access

TL;DR

This paper introduces a new regret metric and algorithm for multi-objective multi-armed bandit problems, improving the balance across conflicting objectives and ensuring sublinear regret for optimal arms.

Contribution

It proposes a comprehensive regret metric and a two-phase algorithm tailored for multi-objective bandits, addressing limitations of previous Pareto regret measures.

Findings

01

The new regret metric accounts for all Pareto-optimal arms.

02

The two-phase algorithm achieves sublinear regret.

03

The approach balances conflicting objectives effectively.

Abstract

Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques