Faster PAC Learning and Smaller Coresets via Smoothed Analysis
Alaa Maalouf, Ibrahim Jubran, Murad Tukan, Dan Feldman

TL;DR
This paper introduces a smoothed analysis approach to PAC learning and coreset construction, focusing on average error rather than worst-case, leading to smaller subsets with size independent of the total data, supported by algorithms and experiments.
Contribution
It generalizes coreset construction by optimizing average error through smoothed analysis, resulting in smaller, more efficient subsets with size independent of data size.
Findings
Algorithms for coresets with size independent of n
Improved coreset constructions for streaming vector summarization
Experimental validation with open source code
Abstract
PAC-learning usually aims to compute a small subset (-sample/net) from items, that provably approximates a given loss function for every query (model, classifier, hypothesis) from a given set of queries, up to an additive error . Coresets generalize this idea to support multiplicative error . Inspired by smoothed analysis, we suggest a natural generalization: approximate the \emph{average} (instead of the worst-case) error over the queries, in the hope of getting smaller subsets. The dependency between errors of different queries implies that we may no longer apply the Chernoff-Hoeffding inequality for a fixed query, and then use the VC-dimension or union bound. This paper provides deterministic and randomized algorithms for computing such coresets and -samples of size independent of , for any finite set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Domain Adaptation and Few-Shot Learning
MethodsCoresets
