Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

Heesang Ann; Min-hwan Oh

arXiv:2602.12901·stat.ML·February 16, 2026

Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

Heesang Ann, Min-hwan Oh

PDF

Open Access

TL;DR

This paper reveals that multiple good arms in multi-objective linear bandits enable implicit exploration, allowing simple greedy algorithms to perform well both theoretically and empirically, without distributional assumptions.

Contribution

It introduces the concept of implicit exploration in multi-objective bandits with multiple good arms and proposes a framework for Pareto fairness analysis.

Findings

01

Simple greedy algorithms achieve strong performance

02

Implicit exploration benefits multi-objective bandits

03

First study without distributional assumptions

Abstract

The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives, they can induce a surprising benefit, implicit exploration. Under this condition, we show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance, both theoretically and empirically. To our knowledge, this is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts. We further introduce a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics