TL;DR
The paper reveals that time-series foundation models often predict by parroting context data, outperforming more complex models at low computational cost, and provides insights into their failure modes and scaling behaviors.
Contribution
It introduces the concept of context parroting as a baseline, analyzes its effectiveness, and links it to the fractal dimension of chaotic attractors, guiding future model design.
Findings
Naive parroting models outperform complex models in diverse dynamical systems.
Forecast accuracy scales with the fractal dimension of the attractor.
Identifies failure modes such as mean convergence and parroting tendencies.
Abstract
Recent time-series foundation models exhibit strong abilities to predict physical systems. These abilities include zero-shot forecasting, in which a model forecasts future states of a system given only a short trajectory as context, without knowledge of the underlying physics. Here, we show that foundation models often forecast through a simple parroting strategy, and when they are not parroting they exhibit some shared failure modes such as converging to the mean. As a result, a naive context parroting model that copies directly from the context scores higher than leading time-series foundation models on predicting a diverse range of dynamical systems, including low-dimensional chaos, turbulence, coupled oscillators, and electrocardiograms, at a tiny fraction of the computational cost. We draw a parallel between context parroting and induction heads, which explains recent works showing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
