The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael, Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

TL;DR
Scaling down large language models by over 30% impairs their fact recall abilities, but their in-context learning capabilities remain largely intact even with reductions of 60-70%, indicating different effects of scaling on core skills.
Contribution
This study reveals that model size reduction impacts fact recall more severely than in-context learning, highlighting a disparity in how scaling affects different core capabilities of LLMs.
Findings
Fact recall drops significantly with >30% size reduction.
In-context processing remains robust despite 60-70% size reduction.
Scaling effects differ markedly between fact recall and in-context learning.
Abstract
How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsPruning
