Parallelizing Gaussian Process Calculations in R
Christopher J. Paciorek, Benjamin Lipshitz, Wei Zhuo, Prabhat, Cari G., Kaufman, Rollin C. Thomas

TL;DR
This paper introduces bigGP, an R package that enables scalable Gaussian process computations through hybrid parallelization, allowing analysis of large datasets by efficiently distributing linear algebra tasks across multiple processors.
Contribution
The paper presents a novel hybrid parallelization approach integrated into an R package for Gaussian process calculations, combining threading and message-passing for improved scalability.
Findings
Successfully analyzed an astrophysics dataset with 67,275 observations.
Achieved balanced computational load and limited communication overhead.
Demonstrated the package's ease of use for R programmers without C or MPI expertise.
Abstract
We consider parallel computation for Gaussian process calculations to overcome computational and memory constraints on the size of datasets that can be analyzed. Using a hybrid parallelization approach that uses both threading (shared memory) and message-passing (distributed memory), we implement the core linear algebra operations used in spatial statistics and Gaussian process regression in an R package called bigGP that relies on C and MPI. The approach divides the matrix into blocks such that the computational load is balanced across processes while communication between processes is limited. The package provides an API enabling R programmers to implement Gaussian process-based methods by using the distributed linear algebra operations without any C or MPI coding. We illustrate the approach and software by analyzing an astrophysics dataset with n=67,275 observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
