BERTnesia: Investigating the capture and forgetting of knowledge in BERT
Jonas Wallat, Jaspreet Singh, Avishek Anand

TL;DR
This paper investigates how BERT captures and forgets relational knowledge across its layers, revealing that intermediate layers hold significant knowledge and that fine-tuning leads to forgetting, with implications for understanding model representations.
Contribution
It provides a detailed analysis of relational knowledge distribution in BERT layers and how fine-tuning affects knowledge retention and forgetting.
Findings
Intermediate layers contain 17-60% of total relational knowledge.
Fine-tuning causes BERT to forget relational knowledge, especially in certain tasks.
Ranking models retain more knowledge and forget less during fine-tuning.
Abstract
Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsLinear Layer · WordPiece · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Linear Warmup With Linear Decay
