Exploring Activation Patterns of Parameters in Language Models
Yudong Wang, Damai Dai, Zhifang Sui

TL;DR
This paper investigates internal parameter activation patterns in large language models using a gradient-based metric, revealing layer-specific behaviors and their relation to input domain and data relevance, with implications for model pruning and understanding.
Contribution
Introduces a gradient-based metric to analyze parameter activation, uncovering layer-specific activation behaviors and their correlation with input domain and data relevance in LLMs.
Findings
Shallow layers activate densely within the same domain
Deep layers show sparse activation and higher similarity across domains
Activation similarity in deep layers correlates with data relevance
Abstract
Most work treats large language models as black boxes without in-depth understanding of their internal working mechanism. In order to explain the internal representations of LLMs, we propose a gradient-based metric to assess the activation level of model parameters. Based on this metric, we obtain three preliminary findings. (1) When the inputs are in the same domain, parameters in the shallow layers will be activated densely, which means a larger portion of parameters will have great impacts on the outputs. In contrast, parameters in the deep layers are activated sparsely. (2) When the inputs are across different domains, parameters in shallow layers exhibit higher similarity in the activation behavior than deep layers. (3) In deep layers, the similarity of the distributions of activated parameters is positively correlated to the empirical data relevance. Further, we develop three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
