TL;DR
FraPPE introduces a computationally efficient algorithm for preference-based pure exploration in multi-objective bandits, achieving optimal sample complexity and outperforming existing methods in identifying Pareto optimal arms.
Contribution
The paper develops a novel, efficient algorithm for preference-based pure exploration that optimally tracks the lower bound for arbitrary preference cones, with significant computational speedups.
Findings
FraPPE achieves the lowest sample complexities in experiments.
The algorithm asymptotically attains optimal sample complexity.
Significant acceleration in solving the lower bound maximisation problem.
Abstract
Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone . Though PrePEx and its variants are well-studied, there does not exist a computationally efficient algorithm that can optimally track the existing lower bound for arbitrary preference cones. We successfully fill this gap by efficiently solving the minimisation and maximisation problems in the lower bound. First, we derive three structural properties of the lower bound that yield a computationally tractable reduction of the minimisation problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation problem in the lower bound. Together, these techniques solve the maxmin optimisation problem in time for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
