Continual Memorization of Factoids in Language Models
Howard Chen, Jiayi Geng, Adithya Bhaskar, Dan Friedman, Danqi Chen

TL;DR
This paper investigates how language models forget factoids during continual fine-tuning and proposes REMIX, a data mixing method, to mitigate forgetting and improve memorization retention.
Contribution
It introduces the continual memorization setting for language models and demonstrates that mixing random and generic data effectively reduces forgetting during fine-tuning.
Findings
REMIX outperforms existing methods in mitigating forgetting.
Factoids are stored in earlier layers with diverse layer retention.
Mixing data types enhances memorization and recall.
Abstract
As new knowledge rapidly accumulates, language models (LMs) with pretrained knowledge quickly become obsolete. A common approach to updating LMs is fine-tuning them directly on new knowledge. However, recent studies have shown that fine-tuning for memorization may be ineffective in storing knowledge or may exacerbate hallucinations. In this work, we introduce a setting we call continual memorization, where a model must memorize and retain a set of factoids through multiple stages of fine-tuning on subsequent datasets. We characterized the forgetting patterns through extensive experiments and show that LMs widely suffer from forgetting, especially when needing to memorize factoids in the second stage. We posit that forgetting can be alleviated by modifying training dynamics: (1) protecting the memorization process when learning factoids or (2) reducing interference from subsequent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
