What Does Preference Learning Recover from Pairwise Comparison Data?
Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

TL;DR
This paper investigates what the Bradley--Terry model recovers from pairwise comparison data, especially when data may violate model assumptions, by formalizing preference information and analyzing factors affecting learning efficiency.
Contribution
It formalizes the preference information in triplet data through CPRD and provides conditions under which BT modeling is appropriate, clarifying what is actually recovered.
Findings
Conditions for BT model appropriateness based on CPRD
Factors like margin and connectivity influence sample efficiency
Provides a data-centric understanding of preference learning
Abstract
Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets , where response is preferred over response for context . The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Constraint Satisfaction and Optimization · Speech and dialogue systems
