Coresets for Kernel Regression
Yan Zheng, Jeff M. Phillips

TL;DR
This paper introduces coresets for kernel regression that enable efficient approximation of large datasets with bounded error, significantly reducing computation time for non-parametric data analysis tasks.
Contribution
The paper presents a novel coreset construction for kernel regression that is independent of dataset size and provides provable error bounds, improving efficiency for large-scale data.
Findings
Coresets achieve negligible approximation error in experiments.
Construction of coresets is highly efficient and scalable.
Significant computational speedups are demonstrated on large datasets.
Abstract
Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data. However, the central operation which is performed many times, evaluating a kernel on the data set, takes linear time. This is impractical for modern large data sets. In this paper we describe coresets for kernel regression: compressed data sets which can be used as proxy for the original data and have provably bounded worst case error. The size of the coresets are independent of the raw number of data points, rather they only depend on the error guarantee, and in some cases the size of domain and amount of smoothing. We evaluate our methods on very large time series and spatial data, and demonstrate that they incur negligible error, can be constructed extremely efficiently, and allow for great computational gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Machine Learning and Data Classification
