An Effective and Efficient Initialization Scheme for Training   Multi-layer Feedforward Neural Networks

Zebin Yang; Hengtao Zhang; Agus Sudjianto; Aijun Zhang

arXiv:2005.08027·cs.LG·June 26, 2020

An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks

Zebin Yang, Hengtao Zhang, Agus Sudjianto, Aijun Zhang

PDF

Open Access

TL;DR

This paper introduces a novel initialization scheme for multi-layer neural networks based on Stein's identity, improving training speed and accuracy by systematically initializing weights layer-wise using statistical properties of the data.

Contribution

The paper proposes SteinGLM, a new initialization method leveraging Stein's identity and eigenvector computations, providing a more effective and efficient way to initialize neural networks.

Findings

01

SteinGLM significantly accelerates training compared to traditional methods.

02

The method achieves higher accuracy in neural network training.

03

Extensive experiments validate the effectiveness of SteinGLM.

Abstract

Network initialization is the first and critical step for training neural networks. In this paper, we propose a novel network initialization scheme based on the celebrated Stein's identity. By viewing multi-layer feedforward neural networks as cascades of multi-index models, the projection weights to the first hidden layer are initialized using eigenvectors of the cross-moment matrix between the input's second-order score function and the response. The input data is then forward propagated to the next layer and such a procedure can be repeated until all the hidden layers are initialized. Finally, the weights for the output layer are initialized by generalized linear modeling. Such a proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Machine Learning and ELM