Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

Corentin Kervadec; Iuliia Lysova; Marco Baroni; Gemma Boleda

arXiv:2601.22795·cs.CL·April 16, 2026

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

Corentin Kervadec, Iuliia Lysova, Marco Baroni, Gemma Boleda

PDF

TL;DR

This paper introduces a method to measure computation density in transformer-based LLMs, revealing that processing is generally dense, varies with input, and correlates with token rarity and context length.

Contribution

It presents a novel density estimator based on mechanistic interpretability, challenging the assumption that LLM computation is sparse and uniform.

Findings

01

LLMs generally involve dense computation rather than sparse.

02

Computation density varies depending on input and context.

03

Rarer tokens and shorter contexts tend to increase density.

Abstract

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion of the parameters, while only marginally impacting performance. This suggests that the computation is not uniformly distributed across the parameters. We introduce here a technique to systematically quantify computation density in LLMs. In particular, we design a density estimator drawing on mechanistic interpretability. We experimentally test our estimator and find that: (1) contrary to what has been often assumed, LLM processing generally involves dense computation; (2) computation density is dynamic, in the sense that models shift between sparse and dense processing regimes depending on the input; (3) per-input density is significantly correlated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.