TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen; Tien-Phat Nguyen; Linh Ngo Van; Duy Minh Ho Nguyen; Khoa D. Doan; and Trung Le

arXiv:2605.12288·cs.CL·May 15, 2026

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Truong Nguyen, Tien-Phat Nguyen, Linh Ngo Van, Duy Minh Ho Nguyen, Khoa D. Doan, and Trung Le

PDF

TL;DR

TokenRatio introduces a token-level preference optimization method that improves language model alignment by focusing on per-token decisions, enhancing stability and diversity.

Contribution

It proposes TBPO, a novel token-level preference model with a Bregman-divergence objective, generalizing DPO and improving alignment performance.

Findings

01

TBPO outperforms sequence-level baselines in alignment quality.

02

TBPO enhances training stability and output diversity.

03

Two instantiations, TBPO-Q and TBPO-A, demonstrate effectiveness across benchmarks.

Abstract

Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions. Existing token-level extensions typically decompose a sequence-level Bradley-Terry objective across timesteps, leaving per-prefix (state-wise) optimality implicit. We study how to recover token-level preference optimality using only standard sequence-level pairwise comparisons. We introduce Token-level Bregman Preference Optimization (TBPO), which posits a token-level Bradley-Terry preference model over next-token actions conditioned on the prefix, and derive a Bregman-divergence density-ratio matching objective that generalizes the logistic/DPO loss while preserving the optimal policy induced by the token-level model and maintaining DPO-like simplicity. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.