State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Hoang-Son Tran; Pranav Gupta; R\'emi Bardenet; Subhroshekhar Ghosh

arXiv:2605.13127·stat.ML·May 14, 2026

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Hoang-Son Tran, Pranav Gupta, R\'emi Bardenet, Subhroshekhar Ghosh

PDF

TL;DR

This paper advances DPP-based minibatch and coreset methods by introducing wavelet-based DPPs with improved accuracy and a conversion technique to discrete kernels, enhancing efficiency and applicability.

Contribution

It proposes new wavelet-based DPPs with better guarantees and a method to convert continuous DPPs into discrete kernels suitable for ML tasks.

Findings

01

Wavelet-based DPPs achieve superior accuracy guarantees.

02

The conversion method preserves variance decay and yields low-rank kernels.

03

Applicable to ML objectives with low regularity.

Abstract

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.