Faster PAC Learning and Smaller Coresets via Smoothed Analysis

Alaa Maalouf; Ibrahim Jubran; Murad Tukan; Dan Feldman

arXiv:2006.05441·cs.LG·June 11, 2020·5 cites

Faster PAC Learning and Smaller Coresets via Smoothed Analysis

Alaa Maalouf, Ibrahim Jubran, Murad Tukan, Dan Feldman

PDF

Open Access

TL;DR

This paper introduces a smoothed analysis approach to PAC learning and coreset construction, focusing on average error rather than worst-case, leading to smaller subsets with size independent of the total data, supported by algorithms and experiments.

Contribution

It generalizes coreset construction by optimizing average error through smoothed analysis, resulting in smaller, more efficient subsets with size independent of data size.

Findings

01

Algorithms for coresets with size independent of n

02

Improved coreset constructions for streaming vector summarization

03

Experimental validation with open source code

Abstract

PAC-learning usually aims to compute a small subset ( $ε$ -sample/net) from $n$ items, that provably approximates a given loss function for every query (model, classifier, hypothesis) from a given set of queries, up to an additive error $ε \in (0, 1)$ . Coresets generalize this idea to support multiplicative error $1 \pm ε$ . Inspired by smoothed analysis, we suggest a natural generalization: approximate the \emph{average} (instead of the worst-case) error over the queries, in the hope of getting smaller subsets. The dependency between errors of different queries implies that we may no longer apply the Chernoff-Hoeffding inequality for a fixed query, and then use the VC-dimension or union bound. This paper provides deterministic and randomized algorithms for computing such coresets and $ε$ -samples of size independent of $n$ , for any finite set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Domain Adaptation and Few-Shot Learning

MethodsCoresets