Kernel Distillation for Fast Gaussian Processes Prediction

Congzheng Song; Yiming Sun

arXiv:1801.10273·stat.ML·November 6, 2018

Kernel Distillation for Fast Gaussian Processes Prediction

Congzheng Song, Yiming Sun

PDF

Open Access

TL;DR

This paper introduces kernel distillation, a method to approximate large Gaussian process models with a smaller, efficient model that retains accuracy while significantly reducing inference time.

Contribution

The paper proposes a novel kernel distillation framework combining inducing points and low-rank approximation to efficiently approximate large GPs.

Findings

01

Kernel distillation reduces inference time significantly.

02

The method maintains high prediction accuracy.

03

It offers a better trade-off between speed and performance.

Abstract

Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost at inference time. In this paper, we introduce a new framework, \textit{kernel distillation}, to approximate a fully trained teacher GP model with kernel matrix of size $n \times n$ for $n$ training points. We combine inducing points method with sparse low-rank approximation in the distillation procedure. The distilled student GP model only costs $O (m^{2})$ storage for $m$ inducing points where $m ≪ n$ and improves the inference time complexity. We demonstrate empirically that kernel distillation provides better trade-off between the prediction time and the test performance compared to the alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference