# Universal Properties of Activation Sparsity in Modern Large Language Models

**Authors:** Filip Szatkowski, Patryk B\k{e}dkowski, Alessio Devoto, Jan Dubi\'nski, Pasquale Minervini, Miko{\l}aj Pi\'orczy\'nski, Simone Scardapane, Bartosz W\'ojcik

arXiv: 2509.00454 · 2026-02-19

## TL;DR

This paper explores the universal properties of activation sparsity in large language models, revealing its growth with model size and extending understanding to diffusion-based LLMs, with implications for efficiency and robustness.

## Contribution

It introduces a general framework for evaluating activation sparsity in LLMs and uncovers universal properties across diverse models and scales, including diffusion-based LLMs.

## Key findings

- Activation sparsity increases with model size.
- Universal properties of sparsity are consistent across different LLM architectures.
- First analysis of activation sparsity in diffusion-based LLMs.

## Abstract

Activation sparsity is an intriguing property of deep neural networks that has been extensively studied in ReLU-based models, due to its advantages for efficiency, robustness, and interpretability. However, methods relying on exact zero activations do not directly apply to modern Large Language Models (LLMs), leading to fragmented, model-specific strategies for LLM activation sparsity and a gap in its general understanding. In this work, we introduce a general framework for evaluating sparsity robustness in contemporary LLMs and conduct a systematic investigation of this phenomenon in their feedforward~(FFN) layers. Our results uncover universal properties of activation sparsity across diverse model families and scales. Importantly, we observe that the potential for effective activation sparsity grows with model size, highlighting its increasing relevance as models scale. Furthermore, we present the first study of activation sparsity in diffusion-based LLMs. Overall, our work provides a comprehensive perspective and practical guidance for harnessing activation sparsity in LLM design and acceleration.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00454/full.md

## Figures

42 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00454/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/2509.00454/full.md

---
Source: https://tomesphere.com/paper/2509.00454