SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning

Yijie Chen; Yijin Liu; Fandong Meng

arXiv:2602.07464·cs.CL·February 10, 2026

SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning

Yijie Chen, Yijin Liu, Fandong Meng

PDF

Open Access

TL;DR

SED-SFT introduces an adaptive method to encourage diversity during supervised fine-tuning of large language models, effectively reducing mode collapse and improving subsequent reinforcement learning performance without significant computational costs.

Contribution

It proposes a novel selective entropy regularization mechanism that balances diversity and accuracy during fine-tuning, addressing mode collapse issues in LLM training.

Findings

01

Significantly improves generation diversity across eight benchmarks.

02

Achieves average RL performance gains of over 2 points on Llama-3.2-3B-Instruct.

03

Maintains low computational overhead compared to standard CE loss.

Abstract

Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has emerged as the standard post-training paradigm for large language models (LLMs). However, the conventional SFT process, driven by Cross-Entropy (CE) loss, often induces mode collapse, where models over-concentrate on specific response patterns. This lack of distributional diversity severely restricts the exploration efficiency required for subsequent RL. While recent studies have attempted to improve SFT by replacing the CE loss, aiming to preserve diversity or refine the update policy, they fail to adequately balance diversity and accuracy, thereby yielding suboptimal performance after RL. To address the mode collapse problem, we propose SED-SFT, which adaptively encourages diversity based on the token exploration space. This framework introduces a selective entropy regularization term with a selective masking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis