Lightweight Gaussian Process Inference in C++ on Metal and CUDA
Yu-Hsueh Fang

TL;DR
LightGP is a fast, dependency-free C++17 library for Gaussian process inference supporting Metal and CUDA, outperforming Python-based libraries like GPyTorch in speed and scalability.
Contribution
The paper introduces LightGP, a novel C++17 library for GP inference with multiple optimized backends, offering significant speed improvements over existing Python libraries.
Findings
LightGP CPU is 2.6--8.7× faster than GPyTorch CPU on Apple M4.
LightGP CUDA is 2.3--6.7× faster than GPyTorch CUDA on NVIDIA RTX 3060.
Fused matrix-free kernel-vector product on Metal achieves 32× speedup at N=20,000.
Abstract
Gaussian process (GP) inference in Python is dominated by libraries such as GPyTorch and GPflow, which are built on deep-learning frameworks and inherit their dispatch overhead and dependency footprint. We present LightGP, a dependency-free C++17 library for GP regression with Python bindings, supporting Apple Metal and NVIDIA CUDA backends alongside tuned CPU paths via Apple Accelerate and OpenBLAS. LightGP provides four inference paths -- exact Cholesky, matrix-free conjugate gradients, sparse variational free energy, and structured kernel interpolation with FFT -- covering problems from to . On an Apple M4, LightGP CPU is 2.6--8.7 faster than GPyTorch CPU for exact GP and faster for sparse GP at every scale tested. On an NVIDIA RTX~3060, LightGP CUDA is 2.3--6.7 faster than GPyTorch CUDA for exact GP up to , with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
