What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Katie Kang; Amrith Setlur; Dibya Ghosh; Jacob Steinhardt; Claire; Tomlin; Sergey Levine; Aviral Kumar

arXiv:2411.07681·cs.LG·November 19, 2024

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Katie Kang, Amrith Setlur, Dibya Ghosh, Jacob Steinhardt, Claire, Tomlin, Sergey Levine, Aviral Kumar

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the learning dynamics of fine-tuned large language models influence their ability to generalize in reasoning tasks, introducing a predictive training metric that correlates with test performance and guides data curation.

Contribution

It introduces the pre-memorization train accuracy metric to predict generalization and demonstrates its utility in improving data efficiency through targeted data curation strategies.

Findings

01

Pre-memorization train accuracy predicts test accuracy with high reliability.

02

Prioritizing low pre-memorization examples improves data efficiency 1.5-2x.

03

The metric correlates with robustness of individual predictions.

Abstract

Despite the remarkable capabilities of modern large language models (LLMs), the mechanisms behind their problem-solving abilities remain elusive. In this work, we aim to better understand how the learning dynamics of LLM finetuning shapes downstream generalization. Our analysis focuses on reasoning tasks, whose problem structure allows us to distinguish between memorization (the exact replication of reasoning steps from the training data) and performance (the correctness of the final solution). We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy: the accuracy of model samples on training queries before they begin to copy the exact reasoning steps from the training set. On the dataset level, this metric is able to reliably predict test accuracy, achieving $R^{2}$ of around or exceeding 0.9 across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katiekang1998/reasoning_generalization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · AI-based Problem Solving and Planning

MethodsFocus