Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives

Zecheng Wang; Deyuan Liu; Chunshan Li; Yupeng Zhang; Zhengyun Zhao; Dianhui Chu; Bingning Wang; Dianbo Sui

arXiv:2602.11424·cs.CL·February 13, 2026

Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives

Zecheng Wang, Deyuan Liu, Chunshan Li, Yupeng Zhang, Zhengyun Zhao, Dianhui Chu, Bingning Wang, Dianbo Sui

PDF

Open Access

TL;DR

This paper introduces a unified framework for supervised fine-tuning that adaptively balances learning from uncertain and confident predictions, improving model robustness and performance.

Contribution

It unifies token-level SFT objectives within a generalized deformed-log family and proposes DEFT, a parameter-free method that dynamically modulates trust in predictions based on entropy.

Findings

01

DEFT outperforms existing methods in balancing exploration and exploitation.

02

The universal gate-error gradient structure provides insights into model trust dynamics.

03

Experimental results show improved robustness and accuracy across tasks.

Abstract

Standard negative log-likelihood (NLL) for Supervised Fine-Tuning (SFT) applies uniform token-level weighting. This rigidity creates a two-fold failure mode: (i) overemphasizing low-probability targets can amplify gradients on noisy supervision and disrupt robust priors, and (ii) uniform weighting provides weak sharpening when the model is already confident. Existing methods fail to resolve the resulting plasticity--stability dilemma, often suppressing necessary learning signals alongside harmful ones. To address this issue, we unify token-level SFT objectives within a generalized deformed-log family and expose a universal gate $\times$ error gradient structure, where the gate controls how much the model trusts its current prediction. By employing the Cayley transform, we map the model's continuously evolving uncertainty onto a continuous focus trajectory, which enables seamless…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications