Panda: A pretrained forecast model for chaotic dynamics
Jeffrey Lai, Anthony Bao, William Gilpin

TL;DR
Panda is a pretrained model that accurately forecasts chaotic systems, demonstrating emergent properties and the ability to generalize to unseen systems and real-world data, inspired by dynamical systems theory.
Contribution
We introduce Panda, a novel pretrained model trained on synthetic chaotic data, capable of zero-shot forecasting and understanding complex nonlinear dynamics.
Findings
Panda achieves accurate zero-shot forecasting of unseen chaotic systems.
Panda exhibits nonlinear resonance patterns in attention heads.
Panda can predict partial differential equations without retraining.
Abstract
Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear DynAmics. We train Panda on a novel synthetic, extensible dataset of chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen chaotic systems preserving both short-term accuracy and distributional measures, nonlinear resonance patterns in attention heads, and effective prediction of…
Peer Reviews
Decision·ICLR 2026 Poster
1. The proposed model shows excellent performance in both short term trajectory predictions and long term statistical property predictions. It is carefully designed to predict chaotic systems in particular and contain lots of architectural considerations that will be useful to researchers working on similar problems. 2. The automatized dataset generation process is also novel, and the suite of criteria used to automatically sift for chaotic systems seems quite useful. The final dataset of 20000
1. This is a very well written paper, and there seem to be no significant weaknesses.
1. Panda shows great scaling behavior both in the amount of data as well as number of parameters. 2. Great to see DST motivated architectural design and the focus on setting a suiting inductive bias, which other FMs in the field lack. 3. The manuscript is well written and easy to follow. 4. Generating new chaotic systems using skew-product coupling and evolutionary search is a great idea and powerful method to generate vast amounts of valid data for this class of FMs. I think this greatly benefi
My two main concerns are the following: 1. The paper misses relevant literature which introduces another DS FM model: DynaMix [1]. Both methods train on a synthetic dataset comprised of low-d chaotic DS, both methods have the aim of zero-shot forecasting and both question the efficacy of existing TS foundation models. I think it would be highly valuable to the SciML community if Panda is compared to such a similarly specialized model, which seems to perform much better than context parroting arc
1. **Synthetic dataset generation**: The evolutionary discovery framework is interesting and well-executed. I think this is a valuable contribution. 2. **Scaling law for dynamics**: The finding that diversity of unique systems matters more than total data volume is very interesting scientifically. 3. **Architectural choices**: Channel attention for coupled systems and dynamics embeddings are good. 4. **Experimental results**: The evaluations are comprehensive and the results strong compared to b
1. **Clarity**: While the paper is generally very well presented, there are a few places where clarity could be improved. For example, the fact that training uses only 3 randomly sampled channels while evaluating on arbitrary dimensions is buried in Appendix B, but for me this an important detail. In addition, the paper doesn't explicitly state how the univariate baselines are applied to multivariate data—do they process channels independently and concatenate forecasts? How does Chronos-SFT hand
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
