KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks
James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

TL;DR
This paper introduces KPFlow, a theoretical framework decomposing gradient flow in recurrent networks into operators, revealing how low-dimensional dynamics and task alignment emerge during training.
Contribution
It presents a novel operator-based decomposition of gradient flow in recurrent models, linking neural collapse to network structure and providing tools for analyzing multi-task learning.
Findings
Low-dimensional latent dynamics arise from network structure.
Collapse is driven by network architecture, not just task complexity.
Operators can quantify sub-task objective alignment.
Abstract
Gradient Descent (GD) and its variants are the primary tool for enabling efficient training of recurrent dynamical systems such as Recurrent Neural Networks (RNNs), Neural ODEs and Gated Recurrent units (GRUs). The dynamics that are formed in these models exhibit features such as neural collapse and emergence of latent representations that may support the remarkable generalization properties of networks. In neuroscience, qualitative features of these representations are used to compare learning in biological and artificial systems. Despite recent progress, there remains a need for theoretical tools to rigorously understand the mechanisms shaping learned representations, especially in finite, non-linear models. Here, we show that the gradient flow, which describes how the model's dynamics evolve over GD, can be decomposed into a product that involves two operators: a Parameter Operator,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
