Learning the Pareto Front Using Bootstrapped Observation Samples

Wonyoung Kim; Garud Iyengar; Assaf Zeevi

arXiv:2306.00096·stat.ML·May 24, 2024·1 cites

Learning the Pareto Front Using Bootstrapped Observation Samples

Wonyoung Kim, Garud Iyengar, Assaf Zeevi

PDF

Open Access

TL;DR

This paper introduces a new algorithm for Pareto front identification in linear bandits that uses a novel estimator and sample reuse technique, achieving near-optimal sample complexity and regret bounds.

Contribution

The paper proposes a new estimator and sample reuse method for Pareto front identification in linear bandits, improving efficiency and theoretical guarantees.

Findings

01

Algorithm achieves near-optimal sample complexity.

02

Regret during estimation is within a logarithmic factor of the optimal.

03

Numerical experiments confirm effective Pareto front identification.

Abstract

We consider Pareto front identification (PFI) for linear bandits (PFILin), i.e., the goal is to identify a set of arms with undominated mean reward vectors when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. In addition, the regret incurred by our algorithm during the estimation is within a logarithmic factor of the optimal regret among all algorithms that identify the Pareto front. Our key contribution is a new estimator that in every round updates the estimate for the unknown parameter along multiple context directions -- in contrast to the conventional estimator that only updates the parameter estimate along the chosen context. This allows us to use low-regret arms to collect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Receptor Mechanisms and Signaling · Adaptive Dynamic Programming Control