A Mechanistic Analysis of Transformers for Dynamical Systems
Gregory Duth\'e, Nikolaos Evangelou, Wei Liu, Ioannis G. Kevrekidis, Eleni Chatzi

TL;DR
This paper investigates how single-layer Transformers process dynamical systems, revealing their capabilities and limitations through a dynamical systems perspective, and explaining their success or failure in modeling time-series data.
Contribution
It provides a theoretical analysis of Transformer mechanisms in dynamical systems, connecting empirical observations with classical theory and identifying operational regimes.
Findings
Convexity constraint limits dynamics representation in linear systems.
Attention acts as an adaptive delay-embedding in nonlinear, partially observable systems.
Oversmoothing occurs in oscillatory linear systems due to softmax attention.
Abstract
Transformers are increasingly adopted for modeling and forecasting time-series, yet their internal mechanisms remain poorly understood from a dynamical systems perspective. In contrast to classical autoregressive and state-space models, which benefit from well-established theoretical foundations, Transformer architectures are typically treated as black boxes. This gap becomes particularly relevant as attention-based models are considered for general-purpose or zero-shot forecasting across diverse dynamical regimes. In this work, we do not propose a new forecasting model, but instead investigate the representational capabilities and limitations of single-layer Transformers when applied to dynamical data. Building on a dynamical systems perspective we interpret causal self-attention as a linear, history-dependent recurrence and analyze how it processes temporal information. Through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Ecosystem dynamics and resilience · Model Reduction and Neural Networks
