Predictive Coresets

Bernardo Flores

arXiv:2502.05725·stat.CO·February 13, 2025

Predictive Coresets

Bernardo Flores

PDF

Open Access

TL;DR

This paper introduces a novel variational coreset method that efficiently constructs smaller data subsets for nonparametric models by matching posterior predictive distributions, enabling scalable inference on large datasets.

Contribution

It proposes a new variational approach using randomized posteriors for coreset construction applicable to nonparametric models, overcoming limitations of traditional KL-based methods.

Findings

01

Effective on diverse problems like density estimation

02

Outperforms traditional methods in nonparametric settings

03

Provides scalable inference for large datasets

Abstract

Modern data analysis often involves massive datasets with hundreds of thousands of observations, making traditional inference algorithms computationally prohibitive. Coresets are selection methods designed to choose a smaller subset of observations while maintaining similar learning performance. Conventional coreset approaches determine these weights by minimizing the Kullback-Leibler (KL) divergence between the likelihood functions of the full and weighted datasets; as a result, this makes them ill-posed for nonparametric models, where the likelihood is often intractable. We propose an alternative variational method which employs randomized posteriors and finds weights to match the unknown posterior predictive distributions conditioned on the full and reduced datasets. Our approach provides a general algorithm based on predictive recursions suitable for nonparametric priors. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification