Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Y{\i}ld{\i}r{\i}m

arXiv:2605.05151·cs.LG·May 7, 2026

Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

Alper Y{\i}ld{\i}r{\i}m

PDF

TL;DR

This paper investigates whether transformer representations for time series forecasting rely on superposition, finding they are sparse and stable, and superposition is not necessary for competitive performance.

Contribution

It provides the first mechanistic interpretability analysis showing transformer FFN representations for time series are sparse and do not depend on superposition.

Findings

01

Expanding the dictionary size has negligible impact on performance.

02

Representations remain sparse and largely unaffected by latent interventions.

03

Superposition is not empirically necessary for competitive forecasting performance.

Abstract

Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.