Latent State Models of Training Dynamics
Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho

TL;DR
This paper introduces a method using hidden Markov models to analyze and interpret the training dynamics of neural networks, revealing phase transitions and latent states that influence convergence and performance.
Contribution
It presents a novel approach to model training as a stochastic process with latent states, enabling better understanding of training trajectories and phase transitions.
Findings
Identifies latent 'detour' states that slow convergence.
Provides a low-dimensional representation of training dynamics.
Analyzes phase transitions across different tasks.
Abstract
The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
