Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
Matthew L. Smith, Jonathan P. Shock, Samuel T. Segun, Iyiola E. Olatunji, Tegawend\'e F. Bissyand\'e

TL;DR
This study reveals that factual recall in large language models depends on model size and topic frequency, following a sigmoid pattern, and provides a quantitative scaling law linking these factors.
Contribution
It introduces a new scaling law connecting model size, topic frequency, and factual recall, validated across multiple model families and datasets.
Findings
Recall quality follows a sigmoid in the log-linear combination of model size and topic frequency.
Model size and topic frequency explain up to 94% of variance in recall within model families.
Recall is gated by a signal-to-noise ratio influenced by concept frequency and model capacity.
Abstract
While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a sigmoid in the log-linear combination of model parameter count and topic representation in training data. These two variables alone explain 60% of the variance across 16 dense models from four families, rising to 74-94% within individual families. The form matches a superposition-inspired account in which recall is gated by a signal-to-noise ratio: signal strength scales with concept frequency and the noise floor with model capacity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
