Scaling Laws for Fact Memorization of Large Language Models

Xingyu Lu; Xiaonan Li; Qinyuan Cheng; Kai Ding; Xuanjing Huang; Xipeng; Qiu

arXiv:2406.15720·cs.CL·June 25, 2024·1 cites

Scaling Laws for Fact Memorization of Large Language Models

Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng, Qiu

PDF

Open Access

TL;DR

This paper investigates how large language models memorize facts, revealing their capacity limits, generalization abilities, and preferences, which informs future improvements in factual knowledge retention.

Contribution

It introduces scaling laws for fact memorization in LLMs, analyzes their behavior with different fact types, and highlights limitations and preferences in fact learning.

Findings

01

Memorization capacity scales linearly with model size.

02

Memorization of all Wikidata facts requires impractically large models.

03

LLMs generalize well to unseen facts and prefer frequent, difficult, and non-redundant facts.

Abstract

Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSoftmax · Attention Is All You Need