Token-weighted Direct Preference Optimization with Attention
Chengyu Huang, Zhuohang Li, Sheng-Yen Chou, Claire Cardie

TL;DR
This paper introduces AttentionPO, a content-aware token-weighted preference optimization method for large language models, improving alignment with human preferences efficiently by leveraging model attention.
Contribution
It proposes a novel token-weighted DPO framework and an attention-based instantiation that enhances content-awareness and efficiency in preference optimization.
Findings
AttentionPO outperforms existing methods on benchmarks.
It is more content-aware and efficient, requiring only two extra forward passes.
Experimental results demonstrate significant performance improvements.
Abstract
Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existing token-level PO methods compute the token weights using either token-position-based heuristic functions or probability estimates given by a separately trained model, which lacks robustness and incurs extra training cost. In contrast, we propose Token-weighted DPO (TwDPO) -- a novel training objective grounded on token-weighted RL -- and AttentionPO -- an instantiation of TwDPO that uses attention from the LLM itself to estimate token weights. AttentionPO prompts the LLM to serve as a pairwise judge and check where the model attends when comparing the responses. This design makes AttentionPO content-aware, adjusting weights based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
