The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison; Tomasz Maciazek; Anthony Stephenson

arXiv:2604.07267·stat.ML·April 9, 2026

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison, Tomasz Maciazek, Anthony Stephenson

PDF

TL;DR

This paper develops a rigorous theoretical framework for scalable Gaussian process regression methods using nearest neighbors, establishing their statistical properties and robustness, and providing a foundation for their practical use on large datasets.

Contribution

It introduces a comprehensive theoretical analysis of NNGP/GPnn regression, proving consistency, convergence rates, and robustness, which were previously empirically observed but not rigorously justified.

Findings

01

Proves almost sure limits for MSE, CAL, and NLL in NNGP/GPnn.

02

Shows universal consistency and minimax rate of convergence for the risk.

03

Establishes asymptotic robustness of hyper-parameter derivatives.

Abstract

Gaussian process ( $GP$ ) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ( $N N GP$ ) regression for geospatial problems and the related scalable $GP nn$ method for more general machine-learning applications. Despite their strong empirical performance, the large- $n$ theory of $N N GP / GP nn$ remains incomplete. We develop a theoretical framework for $N N GP$ and $GP nn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ( $M S E$ ), calibration coefficient ( $C A L$ ), and negative log-likelihood ( $N LL$ ). We then study the $L_{2}$ -risk, prove universal consistency, and show that the risk attains Stone's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.