Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
Xiao Shou, Yanna Ding, Jianxi Gao

TL;DR
Gradient Flow Matching (GFM) models neural network training as a dynamical system, accurately forecasting weight trajectories and convergence across architectures by learning optimizer-aware vector fields.
Contribution
GFM introduces a continuous-time framework that captures optimizer update rules, enabling accurate extrapolation of training dynamics and convergence prediction.
Findings
GFM achieves competitive forecasting accuracy with Transformer-based models.
GFM outperforms LSTM and classical baselines in training trajectory prediction.
GFM generalizes across different neural architectures and initializations.
Abstract
Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Reinforcement Learning in Robotics
MethodsTanh Activation · Sigmoid Activation · Stochastic Gradient Descent · Long Short-Term Memory · Adam
