Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives
Hao Sun, Yunyi Shen, Jean-Francois Ton

TL;DR
This paper critically examines the use of Bradley-Terry models in reward modeling for LLM alignment, providing theoretical foundations, highlighting limitations, and proposing an alternative approach based on order consistency, supported by extensive empirical evaluation.
Contribution
The paper revisits the theoretical basis of BT models in reward modeling, introduces an order-preserving alternative, and empirically compares multiple methods across diverse settings.
Findings
BT models have a solid theoretical foundation but are not necessary for effective reward modeling.
An order-consistent alternative can match or outperform BT models in practice.
Extensive experiments demonstrate the practical viability of the proposed approach across various datasets and models.
Abstract
The Bradley-Terry (BT) model is a common and successful practice in reward modeling for Large Language Model (LLM) alignment. However, it remains unclear why this model -- originally developed for multi-player stochastic game matching -- can be adopted to convert pairwise response comparisons to reward values and make predictions. Especially given the fact that only a limited number of prompt-response pairs are sparsely compared with others. In this paper, we first revisit the foundations of using BT models in reward modeling, and establish the convergence rate of BT reward models based on deep neural networks using embeddings, providing a theoretical foundation for their use. Despite theoretically sound, we argue that the BT model is not a necessary choice from the perspective of downstream optimization. This is because a reward model only needs to preserve the correct ranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics · Economic and Environmental Valuation
MethodsBalanced Selection
