Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models
Youngji Roh, Hyunjin Cho, Jaehyung Kim

TL;DR
This paper explores the intrinsic interpretability of massive activation dimensions in Large Language Models, proposing a simple criterion to identify domain-critical dimensions and demonstrating their effectiveness in domain adaptation and safety scenarios.
Contribution
It introduces a training-free method to identify interpretable, domain-critical dimensions and a targeted activation steering technique that improves domain adaptation and safety interventions.
Findings
Identified domain-critical dimensions behave as semantic detectors.
Activation steering on these dimensions outperforms whole-dimension methods.
The approach enhances domain adaptation and safety in LLMs.
Abstract
Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a small subset of feature dimensions possesses magnitudes significantly larger than the rest. While prior works view these extreme dimensions primarily as artifacts to be managed, we propose a distinct perspective: these dimensions serve as intrinsic interpretable functional units arising from domain specialization. Specifically, we propose a simple magnitude-based criterion to identify Domain-Critical Dimensions in a training-free manner. Our analyses reveal that such dimensions behave as interpretable semantic detectors for symbolic/quantitative patterns or domain-specific terms. In addition, we introduce Critical Dimension Steering, which applies activation steering exclusively to the identified dimensions. Empirical results show that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods
