Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training

Giacomo Borghi; Hyesung Im; Lorenzo Pareschi

arXiv:2603.19808·cs.LG·March 26, 2026

Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training

Giacomo Borghi, Hyesung Im, Lorenzo Pareschi

PDF

Open Access

TL;DR

This paper develops a mathematical framework for understanding the dynamics of population-based neural network training methods, revealing how fast parameter updates and slow hyperparameter evolution interact and influence learning outcomes.

Contribution

It introduces a two-time-scale population dynamics model for neural network training, connecting population-based methods with classical optimization and evolutionary models.

Findings

01

Large-population limit derived for joint parameter and hyperparameter distribution.

02

Conditions identified under which populations evolve toward optimal hyperparameters.

03

Numerical experiments demonstrate the effectiveness of the theoretical model.

Abstract

Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast within-model optimisation with slower population-level adaptation. Despite their empirical success, a general mathematical description of the resulting collective training dynamics remains incomplete. We introduce a theoretical framework for neural network training based on two-time-scale population dynamics. We model a population of neural networks as an interacting agent system in which network parameters evolve through fast noisy gradient updates of SGD/Langevin type, while hyperparameters evolve through slower selection--mutation dynamics. We prove the large-population limit for the joint distribution of parameters and hyperparameters and, under strong time-scale separation, derive a selection--mutation equation for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks