Peering Through Preferences: Unraveling Feedback Acquisition for   Aligning Large Language Models

Hritik Bansal; John Dang; Aditya Grover

arXiv:2308.15812·cs.LG·February 7, 2024·1 cites

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

Hritik Bansal, John Dang, Aditya Grover

PDF

Open Access 1 Repo 2 Models 1 Datasets 1 Video

TL;DR

This paper investigates how the choice of feedback type—ratings versus rankings—affects the alignment and evaluation of large language models, revealing significant inconsistencies and biases that impact model assessment.

Contribution

It uncovers the inconsistency between ratings and rankings in feedback, analyzes biases influencing preferences, and demonstrates the impact of feedback protocols on model evaluation.

Findings

01

Preferences from ratings and rankings disagree 60% of the time.

02

Annotator biases influence feedback, favoring denser responses and accuracy.

03

Ranking-based evaluation favors models trained on rankings data.

Abstract

Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback presents a structural design choice between ratings (e.g., score Response A on a scale of 1-7) and rankings (e.g., is Response A better than Response B?). In this work, we analyze the effect of this design choice for the alignment and evaluation of LLMs. We uncover an inconsistency problem wherein the preferences inferred from ratings and rankings significantly disagree 60% for both human and AI annotators. Our subsequent analysis identifies various facets of annotator biases that explain this phenomena, such as human annotators would rate denser responses higher while preferring accuracy during pairwise judgments. To our surprise, we also observe that the choice of feedback…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hritikbansal/sparse_feedback
pytorchOfficial

Models

Datasets

hbXNov/sparse_feedback
dataset· 31 dl
31 dl

Videos

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques