The Cost of Down-Scaling Language Models: Fact Recall Deteriorates   before In-Context Learning

Tian Jin; Nolan Clement; Xin Dong; Vaishnavh Nagarajan; Michael; Carbin; Jonathan Ragan-Kelley; Gintare Karolina Dziugaite

arXiv:2310.04680·cs.CL·October 10, 2023

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael, Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

PDF

Open Access

TL;DR

Scaling down large language models by over 30% impairs their fact recall abilities, but their in-context learning capabilities remain largely intact even with reductions of 60-70%, indicating different effects of scaling on core skills.

Contribution

This study reveals that model size reduction impacts fact recall more severely than in-context learning, highlighting a disparity in how scaling affects different core capabilities of LLMs.

Findings

01

Fact recall drops significantly with >30% size reduction.

02

In-context processing remains robust despite 60-70% size reduction.

03

Scaling effects differ markedly between fact recall and in-context learning.

Abstract

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsPruning