Sampling from a $k$-DPP without looking at all items
Daniele Calandriello, Micha{\l} Derezi\'nski, Michal Valko

TL;DR
This paper introduces an efficient algorithm for sampling from a $k$-DPP that requires observing only a small subset of all items, significantly reducing computational costs while maintaining exact distributional guarantees.
Contribution
The authors develop a novel adaptive sampling algorithm that efficiently generates $k$-DPP samples without examining all items, improving scalability for large datasets.
Findings
Achieves several orders of magnitude faster sampling compared to previous methods.
Produces exact $k$-DPP samples by observing only a small fraction of data.
Empirically validated on large datasets with high accuracy.
Abstract
Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size , our goal is to sample out of items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. -DPP). Existing -DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all items, making it infeasible for large datasets. A na\"ive heuristic addressing this problem is to uniformly subsample a fraction of the data and perform -DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Management and Algorithms · Biometric Identification and Security · Bayesian Methods and Mixture Models
