Continual Memorization of Factoids in Language Models

Howard Chen; Jiayi Geng; Adithya Bhaskar; Dan Friedman; Danqi Chen

arXiv:2411.07175·cs.CL·February 28, 2025

Continual Memorization of Factoids in Language Models

Howard Chen, Jiayi Geng, Adithya Bhaskar, Dan Friedman, Danqi Chen

PDF

Open Access 1 Repo

TL;DR

This paper investigates how language models forget factoids during continual fine-tuning and proposes REMIX, a data mixing method, to mitigate forgetting and improve memorization retention.

Contribution

It introduces the continual memorization setting for language models and demonstrates that mixing random and generic data effectively reduces forgetting during fine-tuning.

Findings

01

REMIX outperforms existing methods in mitigating forgetting.

02

Factoids are stored in earlier layers with diverse layer retention.

03

Mixing data types enhances memorization and recall.

Abstract

As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorization, where a model must memorize and retain a set of factoids through multiple stages of fine-tuning on subsequent datasets. We characterized the forgetting patterns through extensive experiments and show that LMs widely suffer from forgetting, especially when needing to memorize factoids in the second stage. We posit that forgetting can be alleviated by modifying training dynamics: (1) protecting the memorization process when learning factoids or (2) reducing interference from subsequent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/continual-factoid-memorization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training