TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation
Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah R. Lowe, Aimee Kendall Roundtree

TL;DR
This paper introduces TAB-PO, a novel preference optimization method that enhances language model alignment in token-critical structured prediction tasks by emphasizing important tokens and balancing confidence, leading to improved performance.
Contribution
The paper proposes TAB-PO, a token-level adaptive barrier method that addresses limitations of DPO in low-separation, importance-skewed settings, improving structured prediction accuracy.
Findings
TAB-PO achieves ~4% relative improvement in micro-F1 over SFT.
It outperforms recent preference-optimization baselines.
Effective in medical annotation tasks with hierarchical labels and evidence spans.
Abstract
Direct Preference Optimization is an offline post-SFT method for aligning language models from preference pairs, with strong results in instruction following and summarization. However, DPO's sequence-level implicit reward can be brittle for token-critical structured prediction settings such as medical annotation, which often exhibit (i) low-separation preference pairs, where chosen and rejected completions differ by minimal edit distance (often 1-3 tokens), and (ii) token-importance skew, where sparse semantic tokens (hierarchical labels and evidence Spans) carry disproportionate task importance relative to high-frequency structural tokens (JSON scaffolding). In this regime, standard DPO suffers from margin collapse (insufficient log-probability separation between near-identical preferences), likelihood squeezing (the margin objective shifts the absolute likelihoods of both completions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
