Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization

Meng Li; Guangda Huzhang; Haibo Zhang; Xiting Wang; Anxiang Zeng

arXiv:2505.18720·cs.CL·May 27, 2025

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization

Meng Li, Guangda Huzhang, Haibo Zhang, Xiting Wang, Anxiang Zeng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces OTPO, a novel token weighting scheme based on optimal transport, to improve preference optimization in LLMs by emphasizing meaningful tokens and reducing noise influence.

Contribution

It proposes an adaptive, context-aware token weighting method using optimal transport to enhance preference optimization in large language models.

Findings

01

OTPO improves instruction-following performance.

02

Enhanced interpretability of preference models.

03

Increased reward stability during training.

Abstract

Direct Preference Optimization (DPO) has emerged as a promising framework for aligning Large Language Models (LLMs) with human preferences by directly optimizing the log-likelihood difference between chosen and rejected responses. However, existing methods assign equal importance to all tokens in the response, while humans focus on more meaningful parts. This leads to suboptimal preference optimization, as irrelevant or noisy tokens disproportionately influence DPO loss. To address this limitation, we propose \textbf{O}ptimal \textbf{T}ransport-based token weighting scheme for enhancing direct \textbf{P}reference \textbf{O}ptimization (OTPO). By emphasizing semantically meaningful token pairs and de-emphasizing less relevant ones, our method introduces a context-aware token weighting scheme that yields a more contrastive reward difference estimate. This adaptive weighting enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mimasss2/otpo
pytorchOfficial

Videos

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization· underline

Taxonomy

TopicsDNA and Biological Computing

MethodsFocus · Direct Preference Optimization