Tracing Multilingual Factual Knowledge Acquisition in Pretraining
Yihong Liu, Mingyang Wang, Amir Hossein Kargaran, Felicia K\"orner, Ercong Nie, Barbara Plank, Fran\c{c}ois Yvon, Hinrich Sch\"utze

TL;DR
This paper investigates how multilingual factual knowledge and crosslingual consistency develop during the pretraining of large language models, revealing the roles of fact frequency and transfer effects in knowledge acquisition.
Contribution
It provides the first detailed analysis of the evolution of factual recall and crosslingual transfer during pretraining, highlighting frequency-driven learning and transfer pathways.
Findings
Accuracy and consistency improve over pretraining time for most languages.
Fact frequency in the corpus strongly influences recall accuracy.
Crosslingual transfer benefits low-frequency facts, especially early in pretraining.
Abstract
Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data. However, most studies evaluate only the final model, leaving the development of factual recall and crosslingual consistency throughout pretraining largely unexplored. In this work, we trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B as a case study. We find that both accuracy and consistency improve over time for most languages. We show that this improvement is primarily driven by the fact frequency in the pretraining corpus: more frequent facts are more likely to be recalled correctly, regardless of language. Yet, some low-frequency facts in non-English languages can still be correctly recalled. Our analysis reveals that these instances largely benefit from crosslingual transfer of their English counterparts -- an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Text Readability and Simplification
