Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Johnny Jingze Li, Vivek Kurien George, Gabriel A. Silva

TL;DR
This paper presents a new neural network initialization method that enhances emergence, leading to improved accuracy and training speed across various architectures without additional optimization steps.
Contribution
The paper introduces a simple, theoretically innovative initialization scheme that promotes emergence, significantly boosting neural network performance and training efficiency.
Findings
Improved model accuracy across multiple architectures
Faster training convergence
Effective without batch normalization
Abstract
Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of structural nonlinearity, our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
