Kernel Distillation for Fast Gaussian Processes Prediction
Congzheng Song, Yiming Sun

TL;DR
This paper introduces kernel distillation, a method to approximate large Gaussian process models with a smaller, efficient model that retains accuracy while significantly reducing inference time.
Contribution
The paper proposes a novel kernel distillation framework combining inducing points and low-rank approximation to efficiently approximate large GPs.
Findings
Kernel distillation reduces inference time significantly.
The method maintains high prediction accuracy.
It offers a better trade-off between speed and performance.
Abstract
Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost at inference time. In this paper, we introduce a new framework, \textit{kernel distillation}, to approximate a fully trained teacher GP model with kernel matrix of size for training points. We combine inducing points method with sparse low-rank approximation in the distillation procedure. The distilled student GP model only costs storage for inducing points where and improves the inference time complexity. We demonstrate empirically that kernel distillation provides better trade-off between the prediction time and the test performance compared to the alternatives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference
