Advancing Neural Network Performance through Emergence-Promoting   Initialization Scheme

Johnny Jingze Li; Vivek Kurien George; Gabriel A. Silva

arXiv:2407.19044·cs.LG·January 7, 2025

Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme

Johnny Jingze Li, Vivek Kurien George, Gabriel A. Silva

PDF

Open Access

TL;DR

This paper presents a new neural network initialization method that enhances emergence, leading to improved accuracy and training speed across various architectures without additional optimization steps.

Contribution

The paper introduces a simple, theoretically innovative initialization scheme that promotes emergence, significantly boosting neural network performance and training efficiency.

Findings

01

Improved model accuracy across multiple architectures

02

Faster training convergence

03

Effective without batch normalization

Abstract

Emergence in machine learning refers to the spontaneous appearance of complex behaviors or capabilities that arise from the scale and structure of training data and model architectures, despite not being explicitly programmed. We introduce a novel yet straightforward neural network initialization scheme that aims at achieving greater potential for emergence. Measuring emergence as a kind of structural nonlinearity, our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications