Latent State Models of Training Dynamics

Michael Y. Hu; Angelica Chen; Naomi Saphra; Kyunghyun Cho

arXiv:2308.09543·cs.LG·January 23, 2024

Latent State Models of Training Dynamics

Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method using hidden Markov models to analyze and interpret the training dynamics of neural networks, revealing phase transitions and latent states that influence convergence and performance.

Contribution

It presents a novel approach to model training as a stochastic process with latent states, enabling better understanding of training trajectories and phase transitions.

Findings

01

Identifies latent 'detour' states that slow convergence.

02

Provides a low-dimensional representation of training dynamics.

03

Analyzes phase transitions across different tasks.

Abstract

The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the $L_{2}$ norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

michahu/modeling-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications