Token-Importance Guided Direct Preference Optimization

Ning Yang; Hai Lin; Yibo Liu; Baoliang Tian; Guoqing Liu; Haijun Zhang

arXiv:2505.19653·cs.AI·March 3, 2026

Token-Importance Guided Direct Preference Optimization

Ning Yang, Hai Lin, Yibo Liu, Baoliang Tian, Guoqing Liu, Haijun Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TI-DPO, a novel framework that improves alignment of large language models with human preferences by using a hybrid token importance weighting and triplet loss for more accurate, robust, and diverse responses.

Contribution

The paper presents TI-DPO, combining gradient attribution with a Gaussian prior and triplet loss to enhance token importance estimation and response alignment in LLMs.

Findings

01

TI-DPO outperforms DPO and RLHF in accuracy and diversity.

02

It provides more stable and computationally efficient alignment.

03

The method improves fine-grained semantic control in LLMs.

Abstract

Aligning Large Language Models (LLMs) with human preferences is crucial for safe and effective AI interactions. While popular methods like Direct Preference Optimization (DPO) have simplified alignment, they remain sensitive to data noise and overlook the differential importance of individual tokens. Existing token-level approaches often rely on probability prediction or simplistic weighting schemes to obtain token importance, which still cannot fully address these issues. To solve this problem, we propose the Token-Importance Guided Direct Preference Optimization (TI-DPO), a framework that achieves fine-grained semantic control through two synergistic innovations. First, we propose a novel hybrid weighting mechanism that combines gradient attribution with a Gaussian prior, ensuring both the accuracy and robustness of token importance scores. Second, we employ a triplet loss to provide…

Peer Reviews

Decision·ICLR 2026 Oral

Reviewer 01Rating 4Confidence 4

Strengths

- The paper introduces a novel Token-Importance Guided Direct Preference Optimization (TI-DPO) framework that integrates gradient-based token attribution with a triplet loss objective, aiming to achieve finer-grained preference alignment. - Theoretical analysis provides a formal derivation suggesting that TI-DPO attains a tighter loss bound than standard DPO, offering a potentially more stable optimization objective.

Weaknesses

- The motivation for employing a Gaussian prior in the hybrid weighting mechanism is insufficiently justified. The assumption that salient tokens cluster near the center of a sequence lacks both empirical evidence and theoretical grounding. Alternative priors or adaptive distributions are not discussed. - Although the authors claim that TI-DPO achieves a tighter loss bound than DPO, the paper does not provide quantitative evidence to demonstrate the practical significance of this theoretical imp

Reviewer 02Rating 8Confidence 4

Strengths

1. **Well written and clearly presented.** The paper is well structured and makes effective use of figures and visualizations to support understanding. 2. **Novel contribution.** Introduces a new approach that weights token importance and incorporates a triplet loss for more fine-grained alignment. 3. **Sound theoretical analysis.** Provides theoretical guarantees showing how the proposed formulation relates to and improves upon vanilla DPO. 4. **Strong empirical results.** Experimental re

Weaknesses

1. **Limited generalizability.** The method is presented as DPO specific, which limits its applicability to newer and potentially more effective alignment approaches such as GRPO and DRPO. 2. **Incomplete methodological details.** Some aspects of the triplet loss implementation are insufficiently described, making it difficult to fully understand how this component contributes to the overall improvement of the method.

Reviewer 03Rating 6Confidence 4

Strengths

1. Originality: The paper introduces a novel perspective by extending preference optimization from the traditional sequence-level to a token-level framework, allowing fine-grained semantic control. The proposed Hybrid Weighting Mechanism utilizes gradient attribution to determine the contribution of each token to the model’s output, with a Gaussian prior incorporated as a regularization mechanism to enhance training stability. The use of the Triplet Loss for fine-grained preference alignment and

Weaknesses

1. Limited empirical validation of claimed advantages: Although the paper claims improvements in robustness (e.g., line 20, line 58, line 125), generative diversity (e.g., line 24), and computational efficiency (e.g., lines 24-25), these aspects are not discussed or quantitatively evaluated. There is a lack of experiments that measure robustness under noisy or perturbed preference data. The paper also does not discuss or evaluate the claimed diversity gains. For example, evaluate generative dive

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques

MethodsDirect Preference Optimization