Averaging log-likelihoods in direct alignment
Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar,, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash, Chandak, Olivier Pietquin, Matthieu Geist

TL;DR
This paper proposes a length-invariant averaging method for direct alignment of Large Language Models, improving alignment with human preferences by addressing length bias in log-likelihood comparisons.
Contribution
It introduces a novel averaging operator for length-invariance in direct alignment, bridging contrastive and supervised training approaches.
Findings
Averaging log-likelihoods affects generation scores based on length.
The method reveals a trade-off between generation length and quality.
Empirical results demonstrate improved alignment consistency.
Abstract
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involving the log-likelihood of (dis)preferred completions according to the trained model. However, completions have various lengths, and the log-likelihood is not length-invariant. On the other side, the cross-entropy loss used in supervised training is length-invariant, as batches are typically averaged token-wise. To reconcile these approaches, we introduce a principled approach for making direct alignment length-invariant. Formally, we introduce a new averaging operator, to be composed with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Gene expression and cancer classification
MethodsALIGN
