Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal, John Dang, Aditya Grover

TL;DR
This paper investigates how the choice of feedback type—ratings versus rankings—affects the alignment and evaluation of large language models, revealing significant inconsistencies and biases that impact model assessment.
Contribution
It uncovers the inconsistency between ratings and rankings in feedback, analyzes biases influencing preferences, and demonstrates the impact of feedback protocols on model evaluation.
Findings
Preferences from ratings and rankings disagree 60% of the time.
Annotator biases influence feedback, favoring denser responses and accuracy.
Ranking-based evaluation favors models trained on rankings data.
Abstract
Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback presents a structural design choice between ratings (e.g., score Response A on a scale of 1-7) and rankings (e.g., is Response A better than Response B?). In this work, we analyze the effect of this design choice for the alignment and evaluation of LLMs. We uncover an inconsistency problem wherein the preferences inferred from ratings and rankings significantly disagree 60% for both human and AI annotators. Our subsequent analysis identifies various facets of annotator biases that explain this phenomena, such as human annotators would rate denser responses higher while preferring accuracy during pairwise judgments. To our surprise, we also observe that the choice of feedback…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
