Accurate Coresets for Latent Variable Models and Regularized Regression
Sanskar Ranjan, Supratim Shit

TL;DR
This paper introduces a unified framework for constructing accurate coresets applicable to a broad class of machine learning models, including latent variable models and regularized regression, with provably small sizes and extensive experimental validation.
Contribution
It presents a general coreset construction framework and algorithms for latent variable models and regularized regression, improving coreset sizes and applicability over prior limited models.
Findings
Coreset size for latent variable models is polynomial in the number of latent variables.
Coreset size for regularized regression is smaller than d^p, accounting for regularization effects.
Experimental results confirm theoretical size bounds and effectiveness on real datasets.
Abstract
Accurate coresets are a weighted subset of the original dataset, ensuring a model trained on the accurate coreset maintains the same level of accuracy as a model trained on the full dataset. Primarily, these coresets have been studied for a limited range of machine learning models. In this paper, we introduce a unified framework for constructing accurate coresets. Using this framework, we present accurate coreset construction algorithms for general problems, including a wide range of latent variable model problems and -regularized -regression. For latent variable models, our coreset size is , where is the number of latent variables. For -regularized -regression, our algorithm captures the reduction of model complexity due to regularization, resulting in a coreset whose size is always smaller than for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsCoresets
