Delay Embedding Theory of Neural Sequence Models
Mitchell Ostrow, Adam Eisen, Ila Fiete

TL;DR
This paper investigates whether neural sequence models, like transformers and state-space models, can reconstruct unobserved dynamics from partial observations, linking delay embedding theory with deep learning.
Contribution
It demonstrates that sequence models can learn delay embeddings of underlying systems, with state-space models showing stronger initial reconstruction and efficiency.
Findings
State-space models more effectively reconstruct unobserved dynamics at initialization.
Sequence layers can learn viable embeddings of the underlying system.
State-space models achieve lower error on dynamics tasks.
Abstract
To generate coherent responses, language models infer unobserved meaning from their input text sequence. One potential explanation for this capability arises from theories of delay embeddings in dynamical systems, which prove that unobserved variables can be recovered from the history of only a handful of observed variables. To test whether language models are effectively constructing delay embeddings, we measure the capacities of sequence models to reconstruct unobserved dynamics. We trained 1-layer transformer decoders and state-space sequence models on next-step prediction from noisy, partially-observed time series data. We found that each sequence layer can learn a viable embedding of the underlying system. However, state-space models have a stronger inductive bias than transformers-in particular, they more effectively reconstruct unobserved information at initialization, leading to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
