TL;DR
This paper challenges the idea that individual knowledge in large language models is stored locally, instead showing that capabilities are represented by commonality neurons that can be localized, improving understanding of model internals.
Contribution
The paper introduces a novel Commonality Neuron Localization (CNL) method to identify neurons responsible for shared capabilities, revealing that capabilities are localized rather than individual knowledge.
Findings
Commonality neurons can be localized with a 96.42% overlap rate.
Individual knowledge cannot be effectively localized in model parameters.
Commonality neurons enhance model performance across datasets.
Abstract
Large scale language models have achieved superior performance in tasks related to natural language processing, however, it is still unclear how model parameters affect performance improvement. Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains, which are not unified. We found through fidelity and reliability evaluation experiments that individual knowledge cannot be localized. Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. To further reveal this phenomenon, this paper proposes a Commonality Neuron Localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset. Finally, we have demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
