Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge

Ying Zhang; Benjamin Heinzerling; Dongyuan Li; Ryoma Ishigaki; Yuta Hitomi; Kentaro Inui

arXiv:2505.16178·cs.CL·May 23, 2025

Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge

Ying Zhang, Benjamin Heinzerling, Dongyuan Li, Ryoma Ishigaki, Yuta Hitomi, Kentaro Inui

PDF

Open Access

TL;DR

This paper investigates how different training strategies affect fact recall in language models, revealing that mixed training promotes shared parameters that enhance generalizable knowledge retrieval.

Contribution

It introduces cross-task gradient trace to analyze parameter sharing and demonstrates that mixed training encourages more shared parameters, improving fact recall.

Findings

01

Mixed training leads to a larger set of shared parameters.

02

Shared parameters are more centralized in mixed training.

03

Shared parameters facilitate generalizable fact recall.

Abstract

Fact recall, the ability of language models (LMs) to retrieve specific factual knowledge, remains a challenging task despite their impressive general capabilities. Common training strategies often struggle to promote robust recall behavior with two-stage training, which first trains a model with fact-storing examples (e.g., factual statements) and then with fact-recalling examples (question-answer pairs), tending to encourage rote memorization rather than generalizable fact retrieval. In contrast, mixed training, which jointly uses both types of examples, has been empirically shown to improve the ability to recall facts, but the underlying mechanisms are still poorly understood. In this work, we investigate how these training strategies affect how model parameters are shaped during training and how these differences relate to their ability to recall facts. We introduce cross-task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior

MethodsSparse Evolutionary Training