Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective
Junpeng Zhang, Lei Cheng, Guoxi Zhang, Hua Cai, Qing Xu, Quanshi Zhang

TL;DR
This paper investigates why supervised fine-tuning (SFT) is effective for small neural networks but inconsistent for large language models, using interaction-based explanations to analyze the evolution of interactions during training.
Contribution
It reveals that SFT mainly removes noise-like interactions and overfitting occurs quickly, providing new insights into early stopping and practical training guidance for LLMs.
Findings
SFT primarily removes noise-like interactions in LLMs.
The denoising stage in SFT is very brief.
Continued SFT tends to introduce overfitted interactions.
Abstract
This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. We find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically, we find that (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. We validate these findings across multiple LLMs and datasets. Our findings provide new insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
