GPU-Resident Gaussian Process Regression Leveraging Asynchronous Tasks with HPX
Henrik M\"ollmann, Dirk Pfl\"uger, Alexander Strack

TL;DR
This paper introduces a GPU-accelerated extension to the GPRat library for Gaussian process regression, significantly improving performance for large datasets by leveraging asynchronous GPU tasks and optimized CUDA algorithms.
Contribution
It extends GPRat with a GPU-resident prediction pipeline using CUDA and HPX, enabling faster Gaussian process regression on large datasets.
Findings
Speedups of up to 4.6 times for GP prediction.
GPU implementation surpasses CPU performance for datasets over 128 samples.
Combining HPX with CUDA streams outperforms cuSOLVER by up to 11%.
Abstract
Gaussian processes (GPs) are a widely used regression tool, but the cubic complexity of exact solvers limits their scalability. To address this challenge, we extend the GPRat library by incorporating a fully GPU-resident GP prediction pipeline. GPRat is an HPX-based library that combines task-based parallelism with an intuitive Python API. We implement tiled algorithms for the GP prediction using optimized CUDA libraries, thereby exploiting massive parallelism for linear algebra operations. We evaluate the optimal number of CUDA streams and compare the performance of our GPU implementation to the existing CPU-based implementation. Our results show the GPU implementation provides speedups for datasets larger than 128 training samples. We observe speedups of up to 4.3 for the Cholesky decomposition itself and 4.6 for the GP prediction. Furthermore, combining HPX with multiple CUDA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning in Materials Science · Model Reduction and Neural Networks
