Stream: Scaling up Mechanistic Interpretability to Long Context in LLMs via Sparse Attention

J Rosser; Jos\'e Luis Redondo Garc\'ia; Gustavo Penha; Konstantina Palla; Hugues Bouchard

arXiv:2510.19875·cs.CL·February 3, 2026

Stream: Scaling up Mechanistic Interpretability to Long Context in LLMs via Sparse Attention

J Rosser, Jos\'e Luis Redondo Garc\'ia, Gustavo Penha, Konstantina Palla, Hugues Bouchard

PDF

Open Access

TL;DR

This paper introduces Stream, a scalable hierarchical pruning algorithm that enables efficient interpretability of long-context attention patterns in large language models, making analysis feasible on consumer hardware.

Contribution

The paper presents Sparse Tracing and Stream, novel techniques for near-linear time and linear space analysis of attention in long-context LLMs, significantly improving scalability.

Findings

01

Stream retains critical attention paths while pruning 97-99% of interactions.

02

On RULER benchmark, Stream preserves key retrieval routes and discards 90-96% of interactions.

03

Stream enables long-context interpretability on consumer GPUs.

Abstract

As Large Language Models (LLMs) scale to million-token contexts, traditional Mechanistic Interpretability techniques for analyzing attention scale quadratically with context length, demanding terabytes of memory beyond 100,000 tokens. We introduce Sparse Tracing, a novel technique that leverages dynamic sparse attention to efficiently analyze long context attention patterns. We present Stream, a compilable hierarchical pruning algorithm that estimates per-head sparse attention masks in near-linear time $O (T lo g T)$ and linear space $O (T)$ , enabling one-pass interpretability at scale. Stream performs a binary-search-style refinement to retain only the top- $k$ key blocks per query while preserving the model's next-token behavior. We apply Stream to long chain-of-thought reasoning traces and identify thought anchors while pruning 97-99\% of token interactions. On the RULER benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Big Data and Digital Economy · Multimodal Machine Learning Applications