Learning the Pareto Front Using Bootstrapped Observation Samples
Wonyoung Kim, Garud Iyengar, Assaf Zeevi

TL;DR
This paper introduces a new algorithm for Pareto front identification in linear bandits that uses a novel estimator and sample reuse technique, achieving near-optimal sample complexity and regret bounds.
Contribution
The paper proposes a new estimator and sample reuse method for Pareto front identification in linear bandits, improving efficiency and theoretical guarantees.
Findings
Algorithm achieves near-optimal sample complexity.
Regret during estimation is within a logarithmic factor of the optimal.
Numerical experiments confirm effective Pareto front identification.
Abstract
We consider Pareto front identification (PFI) for linear bandits (PFILin), i.e., the goal is to identify a set of arms with undominated mean reward vectors when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. In addition, the regret incurred by our algorithm during the estimation is within a logarithmic factor of the optimal regret among all algorithms that identify the Pareto front. Our key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions -- in contrast to the conventional estimator that only updates the parameter estimate along the chosen context. This allows us to use low-regret arms to collect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Receptor Mechanisms and Signaling · Adaptive Dynamic Programming Control
