TL;DR
This paper introduces improved frameworks for constructing coresets in offline and streaming settings, reducing the size dependence from quadratic to near-linear in total sensitivity, enabling more space-efficient solutions for various machine learning problems.
Contribution
The paper presents a novel reduction to the sample complexity of learning functions with bounded VC dimension, improving coreset size bounds and generalizing sensitivity sampling methods.
Findings
Reduced coreset size bound from O(t^2) to O(t log t)
Enhanced space efficiency for projective clustering and subspace approximation
Generalized sensitivity sampling supporting non-multiplicative approximations
Abstract
A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if is a set of points, is a set of queries, and is a cost function, then a set with weights is an -coreset for some parameter if is a multiplicative approximation to for all . Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity . In this paper we improve this bound from to . Thus our results imply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
