Blessings of Multiple Good Arms in Multi-Objective Linear Bandits
Heesang Ann, Min-hwan Oh

TL;DR
This paper reveals that multiple good arms in multi-objective linear bandits enable implicit exploration, allowing simple greedy algorithms to perform well both theoretically and empirically, without distributional assumptions.
Contribution
It introduces the concept of implicit exploration in multi-objective bandits with multiple good arms and proposes a framework for Pareto fairness analysis.
Findings
Simple greedy algorithms achieve strong performance
Implicit exploration benefits multi-objective bandits
First study without distributional assumptions
Abstract
The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives, they can induce a surprising benefit, implicit exploration. Under this condition, we show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance, both theoretically and empirically. To our knowledge, this is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts. We further introduce a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics
