NormXLogit: The Head-on-Top Never Lies
Sina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar

TL;DR
NormXLogit is a model-agnostic interpretability method for large language models that assesses token importance using embedding norms and representation similarity, offering a computationally efficient alternative to complex, model-specific techniques.
Contribution
It introduces a novel, model-agnostic approach for token importance assessment based on embedding norms and representation similarity, improving faithfulness and efficiency.
Findings
Outperforms gradient-based methods in faithfulness
Provides competitive layer-wise explanations
Utilizes embedding norms during pre-training for importance estimation
Abstract
With new large language models (LLMs) emerging frequently, it is important to consider the potential value of model-agnostic approaches that can provide interpretability across a variety of architectures. While recent advances in LLM interpretability show promise, many rely on complex, model-specific methods with high computational costs. To address these limitations, we propose NormXLogit, a novel technique for assessing the significance of individual input tokens. This method operates based on the input and output representations associated with each token. First, we demonstrate that during the pre-training of LLMs, the norms of word embeddings effectively capture token importance. Second, we reveal a significant relationship between a token's importance and the extent to which its representation can resemble the model's final prediction. Extensive analyses reveal that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax · Attention Is All You Need
