Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure
Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, Eytan Bakshy

TL;DR
This paper introduces a scalable Gaussian process model with latent Kronecker structure for efficient learning curve prediction in AutoML, handling missing data and reducing computational complexity significantly.
Contribution
It proposes a novel latent Kronecker structure for Gaussian processes that enables efficient inference on large-scale learning curve data with missing values.
Findings
Achieves $ ext{O}(n^3 + m^3)$ time complexity, significantly faster than naive methods.
Matches Transformer performance on learning curve prediction tasks.
Effectively handles missing learning curve observations with structured covariance.
Abstract
A key task in AutoML is to model learning curves of machine learning models jointly as a function of model hyper-parameters and training progression. While Gaussian processes (GPs) are suitable for this task, na\"ive GPs require time and space for hyper-parameter configurations and learning curve observations per hyper-parameter. Efficient inference via Kronecker structure is typically incompatible with early-stopping due to missing learning curve values. We impose to leverage efficient product kernels while handling missing values. In particular, we interpret the joint covariance matrix of observed values as the projection of a latent Kronecker product. Combined with iterative linear solvers and structured matrix-vector multiplication, our method only requires $\mathcal{O}(n^3 +…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Time Series Analysis and Forecasting · Multidisciplinary Science and Engineering Research
MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer
