Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh

TL;DR
This paper investigates how the frequency of terms in pretraining data affects the reasoning performance of GPT-based language models on numerical tasks, revealing a strong correlation between term frequency and accuracy.
Contribution
It provides empirical evidence that term frequency in pretraining data significantly influences models' reasoning accuracy, highlighting potential limitations in generalization.
Findings
Models perform better on frequent terms, with up to 70% accuracy difference.
Performance correlates strongly with term frequency in pretraining data.
Raises questions about true reasoning capabilities beyond data exposure.
Abstract
Pretrained Language Models (LMs) have demonstrated ability to perform numerical reasoning by extrapolating from a few examples in few-shot settings. However, the extent to which this extrapolation relies on robust reasoning is unclear. In this paper, we investigate how well these models reason with terms that are less frequent in the pretraining data. In particular, we examine the correlations between the model performance on test instances and the frequency of terms from those instances in the pretraining data. We measure the strength of this correlation for a number of GPT-based language models (pretrained on the Pile dataset) on various numerical deduction tasks (e.g., arithmetic and unit conversion). Our results consistently demonstrate that models are more accurate on instances whose terms are more prevalent, in some cases above (absolute) more accurate on the top 10\%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
#82 - Dr. JOSCHA BACH - Digital Physics, DL and Consciousness [UNPLUGGED]· youtube
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
