Fast Gaussian Process Regression for Big Data
Sourish Das, Sasanka Roy, Rajiv Sambasivan

TL;DR
This paper introduces a scalable Gaussian Process regression algorithm that uses subset sampling and bagging, enabling effective modeling of large datasets where traditional methods are computationally infeasible.
Contribution
The paper proposes a novel subset sampling algorithm for Gaussian Process regression that is effective for large datasets and compares favorably with existing scalable methods.
Findings
The algorithm performs comparably to stochastic variational and sparse Gaussian processes.
Effective for problems with additive models and few relevant features.
Model stacking enhances performance with other large-scale regression methods.
Abstract
Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size data sets. We present an algorithm that combines estimates from models developed using subsets of the data obtained in a manner similar to the bootstrap. The sample size is a critical parameter for this algorithm. Guidelines for reasonable choices of algorithm parameters, based on detailed experimental study, are provided. Various techniques have been proposed to scale Gaussian Processes to large scale regression tasks. The most appropriate choice depends on the problem context. The proposed method is most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGaussian Process
