Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
Alper Y{\i}ld{\i}r{\i}m

TL;DR
This paper investigates whether transformer representations for time series forecasting rely on superposition, finding they are sparse and stable, and superposition is not necessary for competitive performance.
Contribution
It provides the first mechanistic interpretability analysis showing transformer FFN representations for time series are sparse and do not depend on superposition.
Findings
Expanding the dictionary size has negligible impact on performance.
Representations remain sparse and largely unaffected by latent interventions.
Superposition is not empirically necessary for competitive forecasting performance.
Abstract
Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
