Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue; Yao Wang; Mengqiao Liu; Di Liang; Xingsheng Han; Peiyang Liu; Xianjie Wu; Chenyao Lu; Lei Jiang; Yu Lu; Haibo Shi; Shuang Liang; Minlong Peng; Flora D. Salim

arXiv:2604.10079·cs.CL·April 27, 2026

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

PDF

TL;DR

This paper systematically investigates why supervised fine-tuning of large language models often fails to fully internalize training data, identifying multiple causes and proposing diagnostic and mitigation strategies.

Contribution

It formalizes the Incomplete Learning Phenomenon in LLM fine-tuning, identifies five key causes, and introduces a diagnostic framework with targeted interventions.

Findings

01

Incomplete learning is widespread across models and datasets.

02

Multiple causes contribute to unlearned data subsets.

03

Mitigation strategies can improve overall learning but may not eliminate all failures.

Abstract

Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models often fail to correctly reproduce a subset of their own supervised training data. We refer to this behavior as the Incomplete Learning Phenomenon(ILP). This paper presents the first systematic study of ILP in LLM fine-tuning. We formalize ILP as post-training failure to internalize supervised instances and demonstrate its prevalence across multiple model families, domains, and datasets. Through controlled analyses, we identify five recurrent sources of incomplete learning: (1) missing prerequisite knowledge in the pre-trained model, (2) conflicts between SFT supervision and pre-training knowledge, (3) internal inconsistencies within SFT data, (4) left-side forgetting during sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.