Effects of Initialization Biases on Deep Neural Network Training Dynamics

Nicholas Pellegrino; David Szczecina; Paul W. Fieguth

arXiv:2511.20826·cs.LG·November 27, 2025

Effects of Initialization Biases on Deep Neural Network Training Dynamics

Nicholas Pellegrino, David Szczecina, Paul W. Fieguth

PDF

Open Access

TL;DR

This paper investigates how initial biases in untrained neural networks influence early training dynamics, emphasizing the importance of loss function choice in managing these biases and their impact on model performance.

Contribution

It introduces the concept of Initial Guessing Bias and analyzes how different loss functions affect early training behavior in neural networks.

Findings

01

Initial Guessing Bias causes networks to favor a few classes after random initialization.

02

Loss functions like Blurry and Piecewise-zero influence early training dynamics and robustness.

03

Careful selection of loss functions is crucial to mitigate initial biases during training.

Abstract

Untrained large neural networks, just after random initialization, tend to favour a small subset of classes, assigning high predicted probabilities to these few classes and approximately zero probability to all others. This bias, termed Initial Guessing Bias, affects the early training dynamics, when the model is fitting to the coarse structure of the data. The choice of loss function against which to train the model has a large impact on how these early dynamics play out. Two recent loss functions, Blurry and Piecewise-zero loss, were designed for robustness to label errors but can become unable to steer the direction of training when exposed to this initial bias. Results indicate that the choice of loss function has a dramatic effect on the early phase training of networks, and highlights the need for careful consideration of how Initial Guessing Bias may interact with various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning