Inverse-Free Sparse Variational Gaussian Processes

Stefano Cortinovis; Laurence Aitchison; Stefanos Eleftheriadis; Mark van der Wilk

arXiv:2604.00697·stat.ML·April 2, 2026

Inverse-Free Sparse Variational Gaussian Processes

Stefano Cortinovis, Laurence Aitchison, Stefanos Eleftheriadis, Mark van der Wilk

PDF

TL;DR

This paper introduces an inverse-free variational Gaussian process method that improves stability and convergence, enabling scalable, efficient training on low-precision hardware without matrix inversions.

Contribution

It proposes a better-conditioned variational bound and a matmul-only natural-gradient update, making sparse GPs more practical for hardware acceleration.

Findings

01

Achieves similar performance to traditional methods on benchmarks.

02

Serves as a drop-in replacement in existing models.

03

Can be faster than baseline methods when properly tuned.

Abstract

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision, massively parallel hardware. While one can construct valid variational bounds that rely only on matrix multiplications (matmuls) via an auxiliary matrix parameter, optimising them with off-the-shelf first-order methods is challenging. We make the inverse-free approach practical by proposing a better-conditioned bound and deriving a matmul-only natural-gradient update for the auxiliary parameter, markedly improving stability and convergence. We further provide simple heuristics, such as step-size schedules and stopping criteria, that make the overall optimisation routine fit seamlessly into existing workflows. Across regression and classification benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.