SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning
Yijie Chen, Yijin Liu, Fandong Meng

TL;DR
SED-SFT introduces an adaptive method to encourage diversity during supervised fine-tuning of large language models, effectively reducing mode collapse and improving subsequent reinforcement learning performance without significant computational costs.
Contribution
It proposes a novel selective entropy regularization mechanism that balances diversity and accuracy during fine-tuning, addressing mode collapse issues in LLM training.
Findings
Significantly improves generation diversity across eight benchmarks.
Achieves average RL performance gains of over 2 points on Llama-3.2-3B-Instruct.
Maintains low computational overhead compared to standard CE loss.
Abstract
Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has emerged as the standard post-training paradigm for large language models (LLMs). However, the conventional SFT process, driven by Cross-Entropy (CE) loss, often induces mode collapse, where models over-concentrate on specific response patterns. This lack of distributional diversity severely restricts the exploration efficiency required for subsequent RL. While recent studies have attempted to improve SFT by replacing the CE loss, aiming to preserve diversity or refine the update policy, they fail to adequately balance diversity and accuracy, thereby yielding suboptimal performance after RL. To address the mode collapse problem, we propose SED-SFT, which adaptively encourages diversity based on the token exploration space. This framework introduces a selective entropy regularization term with a selective masking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
