Introduction to Core-sets: an Updated Survey

Dan Feldman

arXiv:2011.09384·cs.LG·November 19, 2020

Introduction to Core-sets: an Updated Survey

Dan Feldman

PDF

TL;DR

This survey reviews the development of core-sets, small data summaries that enable efficient approximate solutions for large-scale optimization and machine learning problems in streaming and distributed environments.

Contribution

It provides a unified and simplified overview of recent core-set constructions with provable size-approximation tradeoffs.

Findings

01

Summarizes key core-set techniques for various optimization problems.

02

Highlights the tradeoff between core-set size and approximation accuracy.

03

Unifies and simplifies existing methods in the literature.

Abstract

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of $k$ centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data", where every possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGreedy Policy Search · Coresets