Scaling Gaussian Processes for Learning Curve Prediction via Latent   Kronecker Structure

Jihao Andreas Lin; Sebastian Ament; Maximilian Balandat; Eytan Bakshy

arXiv:2410.09239·cs.LG·October 15, 2024

Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, Eytan Bakshy

PDF

Open Access

TL;DR

This paper introduces a scalable Gaussian process model with latent Kronecker structure for efficient learning curve prediction in AutoML, handling missing data and reducing computational complexity significantly.

Contribution

It proposes a novel latent Kronecker structure for Gaussian processes that enables efficient inference on large-scale learning curve data with missing values.

Findings

01

Achieves $ ext{O}(n^3 + m^3)$ time complexity, significantly faster than naive methods.

02

Matches Transformer performance on learning curve prediction tasks.

03

Effectively handles missing learning curve observations with structured covariance.

Abstract

A key task in AutoML is to model learning curves of machine learning models jointly as a function of model hyper-parameters and training progression. While Gaussian processes (GPs) are suitable for this task, na\"ive GPs require $O (n^{3} m^{3})$ time and $O (n^{2} m^{2})$ space for $n$ hyper-parameter configurations and $O (m)$ learning curve observations per hyper-parameter. Efficient inference via Kronecker structure is typically incompatible with early-stopping due to missing learning curve values. We impose $latent Kronecker structure$ to leverage efficient product kernels while handling missing values. In particular, we interpret the joint covariance matrix of observed values as the projection of a latent Kronecker product. Combined with iterative linear solvers and structured matrix-vector multiplication, our method only requires $\mathcal{O}(n^3 +…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Time Series Analysis and Forecasting · Multidisciplinary Science and Engineering Research

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer