New Frameworks for Offline and Streaming Coreset Constructions

Vladimir Braverman; Dan Feldman; Harry Lang; Adiel Statman; Samson; Zhou

arXiv:1612.00889·cs.DS·September 20, 2022

New Frameworks for Offline and Streaming Coreset Constructions

Vladimir Braverman, Dan Feldman, Harry Lang, Adiel Statman, Samson, Zhou

PDF

2 Repos

TL;DR

This paper introduces improved frameworks for constructing coresets in offline and streaming settings, reducing the size dependence from quadratic to near-linear in total sensitivity, enabling more space-efficient solutions for various machine learning problems.

Contribution

The paper presents a novel reduction to the sample complexity of learning functions with bounded VC dimension, improving coreset size bounds and generalizing sensitivity sampling methods.

Findings

01

Reduced coreset size bound from O(t^2) to O(t log t)

02

Enhanced space efficiency for projective clustering and subspace approximation

03

Generalized sensitivity sampling supporting non-multiplicative approximations

Abstract

A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if $P$ is a set of points, $Q$ is a set of queries, and $f : P \times Q \to R$ is a cost function, then a set $S \subseteq P$ with weights $w : P \to [0, \infty)$ is an $ϵ$ -coreset for some parameter $ϵ > 0$ if $\sum_{s \in S} w (s) f (s, q)$ is a $(1 + ϵ)$ multiplicative approximation to $\sum_{p \in P} f (p, q)$ for all $q \in Q$ . Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity $t$ . In this paper we improve this bound from $O (t^{2})$ to $O (t lo g t)$ . Thus our results imply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.