Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training
Giacomo Borghi, Hyesung Im, Lorenzo Pareschi

TL;DR
This paper develops a mathematical framework for understanding the dynamics of population-based neural network training methods, revealing how fast parameter updates and slow hyperparameter evolution interact and influence learning outcomes.
Contribution
It introduces a two-time-scale population dynamics model for neural network training, connecting population-based methods with classical optimization and evolutionary models.
Findings
Large-population limit derived for joint parameter and hyperparameter distribution.
Conditions identified under which populations evolve toward optimal hyperparameters.
Numerical experiments demonstrate the effectiveness of the theoretical model.
Abstract
Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast within-model optimisation with slower population-level adaptation. Despite their empirical success, a general mathematical description of the resulting collective training dynamics remains incomplete. We introduce a theoretical framework for neural network training based on two-time-scale population dynamics. We model a population of neural networks as an interacting agent system in which network parameters evolve through fast noisy gradient updates of SGD/Langevin type, while hyperparameters evolve through slower selection--mutation dynamics. We prove the large-population limit for the joint distribution of parameters and hyperparameters and, under strong time-scale separation, derive a selection--mutation equation for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
