
TL;DR
This survey reviews the development of core-sets, small data summaries that enable efficient approximate solutions for large-scale optimization and machine learning problems in streaming and distributed environments.
Contribution
It provides a unified and simplified overview of recent core-set constructions with provable size-approximation tradeoffs.
Findings
Summarizes key core-set techniques for various optimization problems.
Highlights the tradeoff between core-set size and approximation accuracy.
Unifies and simplifies existing methods in the literature.
Abstract
In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data", where every possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGreedy Policy Search · Coresets
