Coresets for Kernel Regression

Yan Zheng; Jeff M. Phillips

arXiv:1702.03644·cs.LG·June 1, 2017·1 cites

Coresets for Kernel Regression

Yan Zheng, Jeff M. Phillips

PDF

Open Access

TL;DR

This paper introduces coresets for kernel regression that enable efficient approximation of large datasets with bounded error, significantly reducing computation time for non-parametric data analysis tasks.

Contribution

The paper presents a novel coreset construction for kernel regression that is independent of dataset size and provides provable error bounds, improving efficiency for large-scale data.

Findings

01

Coresets achieve negligible approximation error in experiments.

02

Construction of coresets is highly efficient and scalable.

03

Significant computational speedups are demonstrated on large datasets.

Abstract

Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data. However, the central operation which is performed many times, evaluating a kernel on the data set, takes linear time. This is impractical for modern large data sets. In this paper we describe coresets for kernel regression: compressed data sets which can be used as proxy for the original data and have provably bounded worst case error. The size of the coresets are independent of the raw number of data points, rather they only depend on the error guarantee, and in some cases the size of domain and amount of smoothing. We evaluate our methods on very large time series and spatial data, and demonstrate that they incur negligible error, can be constructed extremely efficiently, and allow for great computational gains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Machine Learning and Data Classification