Transformers for dynamical systems learn transfer operators in-context
Anthony Bao, Jeffrey Lai, William Gilpin

TL;DR
This paper demonstrates that attention-based transformer models can learn to forecast unseen dynamical systems by leveraging in-context learning, delay embedding, and invariant set identification, without retraining.
Contribution
It reveals how transformers apply transfer-operator strategies in-context, enabling zero-shot forecasting of different physical systems through dynamical manifold detection.
Findings
Transformers use delay embedding to lift low-dimensional data.
Models identify and forecast long-lived invariant sets.
A double descent phenomenon occurs between in-distribution and out-of-distribution performance.
Abstract
Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional understanding of learning and adaptation in physical systems. Here, we study in-context learning of dynamical systems in a minimal setting: we train a small two-layer, single-head transformer to forecast one dynamical system, and then evaluate its ability to forecast a different dynamical system without retraining. We discover an early tradeoff in training between in-distribution and out-of-distribution performance, which manifests as a secondary double descent phenomenon. We discover that attention-based models apply a transfer-operator forecasting strategy in-context. They (1) lift low-dimensional time series using delay embedding, to detect the system's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
