Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

Changkun Guan; Mengfan Xu

arXiv:2604.07096·cs.LG·May 8, 2026

Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

Changkun Guan, Mengfan Xu

PDF

TL;DR

This paper demonstrates that multi-objective bandits are not inherently harder than single-objective ones in terms of Pareto regret, providing a new optimal algorithm with practical evaluations.

Contribution

It introduces a novel method with confidence bounds and top-two races that achieves optimal Pareto regret independent of the number of objectives.

Findings

01

Pareto regret scales inversely with the largest objective-wise suboptimality gap

02

The proposed method achieves Pareto regret of O(log T / g^†)

03

Empirical results show significant regret reduction and Pareto optimality

Abstract

Multi-objective bandits have attracted increasing attention for their broad applicability, with \(d\)-dimensional reward vectors inducing Pareto regret. There has been a subtle debate over whether this added structure makes the problem fundamentally harder than single-objective bandits. We answer this by showing that, in terms of Pareto regret, it is surprisingly no harder: Pareto regret scales inversely with \(g^\dagger\), the largest objective-wise suboptimality gap, and thus matches the smallest objective-wise classical regret. We formalize this idea via a novel method with upper and lower confidence-bound estimators for every arm-objective pair. It uses top-two races to compare arms within each objective and an uncertainty-greedy rule to allocate exploration toward the largest objective-wise gap \(g^\dagger\), until the corresponding Pareto-optimal arm is committed to. We prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.