BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Jonas Wallat; Jaspreet Singh; Avishek Anand

arXiv:2106.02902·cs.CL·September 9, 2021

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Jonas Wallat, Jaspreet Singh, Avishek Anand

PDF

1 Repo

TL;DR

This paper investigates how BERT captures and forgets factual knowledge across its layers and during fine-tuning, revealing that intermediate layers hold significant knowledge and that fine-tuning can lead to forgetting, with implications for understanding model memory.

Contribution

It provides a detailed analysis of where factual knowledge resides in BERT's layers and how fine-tuning affects this knowledge, highlighting the importance of intermediate layers and training objectives.

Findings

01

Intermediate layers contain 17-60% of total knowledge.

02

Fine-tuning leads to forgetting of relational knowledge.

03

Ranking models retain more knowledge after fine-tuning.

Abstract

Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this article, we probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. While probing for linguistic understanding is commonly applied to all layers of BERT as well as fine-tuned models, this has not been done for factual knowledge. We utilize existing knowledge base completion tasks (LAMA) to probe every layer of pre-trained as well as fine-tuned BERT models(ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jwallat/knowledge-probing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Dense Connections