Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu, Pepa Atanasova, Isabelle Augenstein

TL;DR
This paper introduces a unified framework to evaluate and compare attribution methods for understanding the parametric knowledge stored in language models, highlighting the complementary strengths of Instance and Neuron Attribution techniques.
Contribution
The study develops a novel evaluation framework and new attribution methods, providing systematic comparison and insights into the knowledge revealed by IA and NA in language models.
Findings
NA reveals more diverse and comprehensive knowledge
IA offers unique insights not captured by NA
Combining IA and NA can enhance understanding of LM knowledge
Abstract
Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
