Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
Changkun Guan, Mengfan Xu

TL;DR
This paper demonstrates that multi-objective bandits are not inherently harder than single-objective ones in terms of Pareto regret, providing a new optimal algorithm with practical evaluations.
Contribution
It introduces a novel method with confidence bounds and top-two races that achieves optimal Pareto regret independent of the number of objectives.
Findings
Pareto regret scales inversely with the largest objective-wise suboptimality gap
The proposed method achieves Pareto regret of O(log T / g^†)
Empirical results show significant regret reduction and Pareto optimality
Abstract
Multi-objective bandits have attracted increasing attention for their broad applicability, with \(d\)-dimensional reward vectors inducing Pareto regret. There has been a subtle debate over whether this added structure makes the problem fundamentally harder than single-objective bandits. We answer this by showing that, in terms of Pareto regret, it is surprisingly no harder: Pareto regret scales inversely with \(g^\dagger\), the largest objective-wise suboptimality gap, and thus matches the smallest objective-wise classical regret. We formalize this idea via a novel method with upper and lower confidence-bound estimators for every arm-objective pair. It uses top-two races to compare arms within each objective and an uncertainty-greedy rule to allocate exploration toward the largest objective-wise gap \(g^\dagger\), until the corresponding Pareto-optimal arm is committed to. We prove that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
