Learning and Unlearning of Fabricated Knowledge in Language Models

Chen Sun; Nolan Andrew Miller; Andrey Zhmoginov; Max Vladymyrov; Mark; Sandler

arXiv:2410.21750·cs.CL·October 30, 2024

Learning and Unlearning of Fabricated Knowledge in Language Models

Chen Sun, Nolan Andrew Miller, Andrey Zhmoginov, Max Vladymyrov, Mark, Sandler

PDF

Open Access

TL;DR

This paper investigates how injected facts into language models are retained or forgotten over training, revealing a sweet spot of fact novelty that affects memory longevity and model hallucinations, with implications for data poisoning mitigation.

Contribution

It introduces a new probing dataset 'Outlandish' and demonstrates how different types of injected facts influence memory retention and hallucination, proposing a simple method to erase conflicting knowledge.

Findings

01

Conflicting facts are retained for tens of thousands of training steps.

02

Mundane and scrambled prompts are forgotten more rapidly.

03

Multi-step sparse updates can largely erase conflicting knowledge.

Abstract

What happens when a new piece of knowledge is introduced into the training data and how long does it last while a large language model (LM) continues to train? We investigate this question by injecting facts into LMs from a new probing dataset, "Outlandish", which is designed to permit the testing of a spectrum of different fact types. When studying how robust these memories are, there appears to be a sweet spot in the spectrum of fact novelty between consistency with world knowledge and total randomness, where the injected memory is the most enduring. Specifically we show that facts that conflict with common knowledge are remembered for tens of thousands of training steps, while prompts not conflicting with common knowledge (mundane), as well as scrambled prompts (randomly jumbled) are both forgotten much more rapidly. Further, knowledge-conflicting facts can "prime'' how the language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling