Entropy-Lens: Uncovering Decision Strategies in LLMs
Riccardo Ali, Francesco Caso, Christopher Irwin, Pietro Li\`o

TL;DR
This paper introduces Entropy-Lens, a novel method using entropy of logit predictions to analyze token-space dynamics in LLMs, revealing insights into decision strategies and their impact on performance.
Contribution
The paper presents Entropy-Lens, a new scalar metric for interpreting token prediction dynamics in LLMs, uncovering family-specific and task-dependent strategies.
Findings
Entropy profiles reveal expansion and pruning strategies in token predictions.
Token strategies are family-specific and invariant under depth rescaling.
Expansion strategies generally have a greater impact on downstream performance.
Abstract
In large language models (LLMs), each block operates on the residual stream to map input token sequences to output token distributions. However, most of the interpretability literature focuses on internal latent representations, leaving token-space dynamics underexplored. The high dimensionality and categoricity of token distributions hinder their analysis, as standard statistical descriptors are not suitable. We show that the entropy of logit-lens predictions overcomes these issues. In doing so, it provides a per-layer scalar, permutation-invariant metric. We introduce Entropy-Lens to distill the token-space dynamics of the residual stream into a low-dimensional signal. We call this signal the entropy profile. We apply our method to a variety of model sizes and families, showing that (i) entropy profiles uncover token prediction dynamics driven by expansion and pruning strategies; (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
