Guiding Neural Network Initialization via Marginal Likelihood   Maximization

Anthony S. Tai; Chunfeng Huang

arXiv:2012.09943·stat.ML·December 21, 2020

Guiding Neural Network Initialization via Marginal Likelihood Maximization

Anthony S. Tai, Chunfeng Huang

PDF

Open Access

TL;DR

This paper introduces a data-driven method for selecting neural network hyperparameters at initialization by maximizing marginal likelihood, leading to improved performance and potential computational savings.

Contribution

It presents a novel approach that uses Gaussian process relationships to guide hyperparameter initialization, demonstrating effectiveness on MNIST with reduced computation.

Findings

01

Marginal likelihood maximization improves MNIST classification accuracy.

02

The method shows consistent results across experiments.

03

Computational cost can be reduced with smaller training sets.

Abstract

We propose a simple, data-driven approach to help guide hyperparameter selection for neural network initialization. We leverage the relationship between neural network and Gaussian process models having corresponding activation and covariance functions to infer the hyperparameter values desirable for model initialization. Our experiment shows that marginal likelihood maximization provides recommendations that yield near-optimal prediction performance on MNIST classification task under experiment constraints. Furthermore, our empirical results indicate consistency in the proposed technique, suggesting that computation cost for the procedure could be significantly reduced with smaller training sets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsGaussian Process