Structured First-Layer Initialization Pre-Training Techniques to Accelerate Training Process Based on $\varepsilon$-Rank

Tao Tang; Jiang Yang; Yuxiang Zhao; Quanhui Zhu

arXiv:2507.11962·math.NA·July 17, 2025

Structured First-Layer Initialization Pre-Training Techniques to Accelerate Training Process Based on $\varepsilon$-Rank

Tao Tang, Jiang Yang, Yuxiang Zhao, Quanhui Zhu

PDF

Open Access

TL;DR

This paper introduces a structured first-layer initialization method that enhances neural feature diversity at the start, leading to faster training, better convergence, and improved accuracy in scientific computing neural networks.

Contribution

It proposes a novel SFLI pre-training technique that constructs $oldsymbol{ ext{ extit{ extepsilon}}}$-linearly independent neurons, improving training efficiency across various architectures.

Findings

01

SFLI increases initial $ ext{ extit{ extepsilon}}$-rank and accelerates convergence.

02

The method mitigates spectral bias and improves prediction accuracy.

03

Implementation requires only one line of code addition.

Abstract

Training deep neural networks for scientific computing remains computationally expensive due to the slow formation of diverse feature representations in early training stages. Recent studies identify a staircase phenomenon in training dynamics, where loss decreases are closely correlated with increases in $ε$ -rank, reflecting the effective number of linearly independent neuron functions. Motivated by this observation, this work proposes a structured first-layer initialization (SFLI) pre-training method to enhance the diversity of neural features at initialization by constructing $ε$ -linearly independent neurons in the input layer. We present systematic initialization schemes compatible with various activation functions and integrate the strategy into multiple neural architectures, including modified multi-layer perceptrons and physics-informed residual adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology and Data Analysis · Engineering and Test Systems · Educational Technology and Assessment