# Practical Coreset Constructions for Machine Learning

**Authors:** Olivier Bachem, Mario Lucic, Andreas Krause

arXiv: 1703.06476 · 2017-06-06

## TL;DR

This paper reviews and advances methods for constructing small, efficient data summaries called coresets, which enable scalable and provably accurate solutions for various machine learning tasks.

## Contribution

It introduces a unified framework for coreset construction applicable to multiple problems and summarizes recent algorithms across different machine learning models.

## Key findings

- Provides a sound theoretical framework for coreset construction.
- Summarizes state-of-the-art algorithms for various ML problems.
- Demonstrates the effectiveness of coresets in large-scale data analysis.

## Abstract

We investigate coresets - succinct, small summaries of large data sets - so that solutions found on the summary are provably competitive with solution found on the full data set. We provide an overview over the state-of-the-art in coreset construction for machine learning. In Section 2, we present both the intuition behind and a theoretically sound framework to construct coresets for general problems and apply it to $k$-means clustering. In Section 3 we summarize existing coreset construction algorithms for a variety of machine learning problems such as maximum likelihood estimation of mixture models, Bayesian non-parametric models, principal component analysis, regression and general empirical risk minimization.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.06476/full.md

---
Source: https://tomesphere.com/paper/1703.06476