Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu, Yu Pan, Guanting Chen, Xiaocheng Li

TL;DR
This paper introduces a new framework for learning reward models from ordinal feedback, leveraging the wisdom of the crowd to utilize more nuanced human preferences and improve alignment of large language models.
Contribution
It generalizes the Bradley-Terry model to ordinal feedback, providing a probabilistic framework and theoretical analysis that demonstrate the benefits of fine-grained preference data.
Findings
Ordinal feedback reduces Rademacher complexity compared to binary feedback.
Fine-grained feedback improves reward model accuracy in various settings.
Incorporating tied preferences enhances reward learning.
Abstract
Learning a reward model (RM) from human preferences has been an important component in aligning large language models (LLMs). The canonical setup of learning RMs from pairwise preference data is rooted in the classic Bradley-Terry (BT) model that accepts binary feedback, i.e., the label being either Response 1 is better than Response 2, or the opposite. Such a setup inevitably discards potentially useful samples (such as "tied" between the two responses) and loses more fine-grained information (such as "slightly better"). In this paper, we propose a framework for learning RMs under ordinal feedback which generalizes the case of binary preference feedback to any arbitrary granularity. Specifically, we first identify a marginal unbiasedness condition, which generalizes the assumption of the BT model in the existing binary feedback setting. The condition validates itself via the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDiverse Scientific and Economic Studies
MethodsKnowledge Distillation
