The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Devin Kwok; G\"ul Sena Alt{\i}nta\c{s}; Colin Raffel; David Rolnick

arXiv:2506.13234·cs.LG·June 17, 2025

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Devin Kwok, G\"ul Sena Alt{\i}nta\c{s}, Colin Raffel, David Rolnick

PDF

Open Access

TL;DR

This paper investigates how small initial differences in neural network training can lead to divergent models, especially during early training, affecting stability, fine-tuning, and ensemble diversity.

Contribution

It demonstrates that initial perturbations cause significant divergence in training trajectories during the chaotic phase, with implications for model stability and ensemble methods.

Findings

01

Divergence occurs rapidly during early training.

02

Perturbations lead to different loss minima.

03

Divergence diminishes over training time.

Abstract

Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i) $L^{2}$ distance between parameters, (ii) the loss barrier when interpolating between networks, (iii) $L^{2}$ and barrier between parameters after permutation alignment, and (iv) representational similarity between intermediate activations; revealing how perturbations across different hyperparameter or fine-tuning settings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications