Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression
Robert Allison, Anthony Stephenson, Samuel F, Edward Pyzer-Knapp

TL;DR
This paper presents a scalable Gaussian Process regression method that leverages locality and robustness, achieving high accuracy and well-calibrated uncertainty estimates with significantly reduced computational costs on large datasets.
Contribution
It introduces a new perspective on GP nearest-neighbour prediction, demonstrating that accuracy becomes insensitive to model misspecification as data size grows, enabling efficient large-scale regression.
Findings
High predictive accuracy with minimal parameter tuning
Robust uncertainty calibration despite model misspecification
Training on 1.6 million data points in about 30 seconds
Abstract
The accurate predictions and principled uncertainty measures provided by GP regression incur O(n^3) cost which is prohibitive for modern-day large-scale applications. This has motivated extensive work on computationally efficient approximations. We introduce a new perspective by exploring robustness properties and limiting behaviour of GP nearest-neighbour (GPnn) prediction. We demonstrate through theory and simulation that as the data-size n increases, accuracy of estimated parameters and GP model assumptions become increasingly irrelevant to GPnn predictive accuracy. Consequently, it is sufficient to spend small amounts of work on parameter estimation in order to achieve high MSE accuracy, even in the presence of gross misspecification. In contrast, as n tends to infinity, uncertainty calibration and NLL are shown to remain sensitive to just one parameter, the additive noise-variance;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Machine Learning and Data Classification
