Towards a theory of learning dynamics in deep state space models
Jakub Sm\'ekal, Jimmy T.H. Smith, Michael Kleinman, Dan Biderman,, Scott W. Linderman

TL;DR
This paper investigates the learning dynamics of linear state space models, revealing how data structure, model size, and initialization influence training, and draws connections to deep linear networks to advance understanding of deep SSMs.
Contribution
It provides an analytical framework for understanding the learning process of linear SSMs in the frequency domain and links their dynamics to deep linear networks, paving the way for future nonlinear extensions.
Findings
Analytical solutions for linear SSM learning dynamics in the frequency domain.
Established a connection between 1D SSMs and deep linear networks.
Showed how over-parameterization affects convergence time.
Abstract
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Complex Systems and Time Series Analysis
