TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

Samah Fodeh; Linhai Ma; Ganesh Puthiaraju; Srivani Talakokkul; Afshan Khan; Ashley Hagaman; Sarah R. Lowe; Aimee Kendall Roundtree

arXiv:2603.00025·cs.CL·March 3, 2026

TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah R. Lowe, Aimee Kendall Roundtree

PDF

Open Access 4 Models

TL;DR

This paper introduces TAB-PO, a novel preference optimization method that enhances language model alignment in token-critical structured prediction tasks by emphasizing important tokens and balancing confidence, leading to improved performance.

Contribution

The paper proposes TAB-PO, a token-level adaptive barrier method that addresses limitations of DPO in low-separation, importance-skewed settings, improving structured prediction accuracy.

Findings

01

TAB-PO achieves ~4% relative improvement in micro-F1 over SFT.

02

It outperforms recent preference-optimization baselines.

03

Effective in medical annotation tasks with hierarchical labels and evidence spans.

Abstract

Direct Preference Optimization is an offline post-SFT method for aligning language models from preference pairs, with strong results in instruction following and summarization. However, DPO's sequence-level implicit reward can be brittle for token-critical structured prediction settings such as medical annotation, which often exhibit (i) low-separation preference pairs, where chosen and rejected completions differ by minimal edit distance (often 1-3 tokens), and (ii) token-importance skew, where sparse semantic tokens (hierarchical labels and evidence Spans) carry disproportionate task importance relative to high-frequency structural tokens (JSON scaffolding). In this regime, standard DPO suffers from margin collapse (insufficient log-probability separation between near-identical preferences), likelihood squeezing (the margin objective shifts the absolute likelihoods of both completions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques