ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu; Taiqiang Wu; Runming Yang; Shaoning Sun; Junjie Wang; Yujiu Yang

arXiv:2601.09195·cs.CL·May 7, 2026

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang

PDF

1 Repo

TL;DR

ProFit introduces a probability-guided token selection method that masks low-probability tokens during supervised fine-tuning of LLMs, reducing overfitting and improving performance on reasoning and math tasks.

Contribution

It reveals the link between token probability and semantic importance and proposes a novel masking strategy to enhance SFT without requiring multiple references.

Findings

01

ProFit outperforms traditional SFT on reasoning benchmarks.

02

Masking low-probability tokens reduces overfitting.

03

The approach improves model generalization in reasoning and math tasks.

Abstract

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utaotao/ProFit
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.