Learning Dynamics of LLM Finetuning

Yi Ren; Danica J. Sutherland

arXiv:2407.10490·cs.LG·July 1, 2025·1 cites

Learning Dynamics of LLM Finetuning

Yi Ren, Danica J. Sutherland

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates the learning dynamics of large language models during finetuning, revealing how influence accumulates and explaining phenomena like hallucination reinforcement and the squeezing effect, offering insights for improved alignment.

Contribution

It introduces a unified framework to analyze LLM finetuning dynamics, explaining hallucination strengthening and the squeezing effect, and proposes a simple method to enhance alignment.

Findings

01

Hallucination types are reinforced after finetuning.

02

The squeezing effect explains decreased output quality in prolonged DPO.

03

Insights lead to a simple method for better alignment.

Abstract

Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 10Confidence 4

Strengths

1. They motivate and ground the study of training dynamics for LLMs well and highlight key difficulties in applying standard influence function style analyses to LLM finetuning and preference optimization, a highlight of the preliminary material. 2. The formalism and analysis in Section 3 is generally clear and well written. Figures 2 and the bullet point enumeration in 3.3 are concise presentations of the core claims of the analysis. 3. The experimental design of the empirical verification sect

Weaknesses

1. The clarity of Sec 3.1 when discussing the causal masking and teacher-forced production of the full next-token logits set could be improved, though this is admittedly tricky. It is possible that for some readers, a diagram of the matrix structure here could be helpful since most papers don't tackle the more complex formulation of the influence problem and so readers may not be clear on it. (The reviewer imagines a teacher-forced causally masked model forward on input and label sequences X,Y

Reviewer 02Rating 8Confidence 3

Strengths

1) The paper has strong theoretical and experimental backing for their analysis. 2) First paper to propose a framework extending learning dynamics to LLMs.

Weaknesses

1) Did not find any specific weaknesses.

Reviewer 03Rating 8Confidence 3

Strengths

The paper is overall well written, clearly explains the motivation and the approach. The use of simple "pedagogical" examples also helps in communicating the main message of the paper. The main contribution seems to be in the application of the theory to the different loss functions used in LLM finetuning, making explicit use of the "decomposition" of the effective learning dynamics into 3 different terms, and identify which one directly depends on the loss function being used. This is demonstr

Weaknesses

There are two major limitations here that, while are (somewhat) acknowledged by the authors, can benefit from a more careful discussion and/or analysis. 1. The first is the obvious limitation that the entire analysis is being done under the assumption that the "feature map" is held fixed, and only the classification layer is changing during learning, but the extent to which this assumption holds in practice is not being evaluated at all. Without such an evaluation (for example, either by quanti

Code & Models

Repositories

joshua-ren/learning_dynamics_llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Fuzzy Logic and Control Systems · Natural Language Processing Techniques

MethodsDirect Preference Optimization