ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
Timo Kaufmann, Yannick Metz, Daniel Keim, Eyke H\"ullermeier

TL;DR
ResponseRank introduces a method to learn preference strength from noisy proxy signals, improving data efficiency and robustness in preference modeling for reinforcement learning and language tasks.
Contribution
It presents a novel approach that leverages local relative strength signals for robust preference strength learning, with empirical validation across multiple domains.
Findings
Enhanced sample efficiency in preference learning
Improved robustness to noisy signals
Effective across diverse tasks including language and RL
Abstract
Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grapes, but which preference is stronger? Strength is crucial for decision-making under uncertainty and generalization of preference models, but hard to measure reliably. Metadata such as response times and inter-annotator agreement can serve as proxies for strength, but are often noisy and confounded. We propose ResponseRank to address the challenge of learning from noisy strength signals. Our method uses relative differences in proxy signals to rank responses to pairwise comparisons by their inferred preference strength. To control for systemic variation, we compare signals only locally within carefully constructed strata. This enables robust learning of utility differences consistent with…
Peer Reviews
Decision·NeurIPS 2025 poster
Strengths: 1. The authors use annotation time intervals as an indicator of preference strength, which allows incorporating preference degrees into the scoring information and should theoretically yield better results. 2. Meanwhile, the authors focus on addressing the lack of metrics for evaluating how well preference models capture preference degrees, introducing a measure based on the correlation coefficient between true scores and trained scores. Weakness: 1. Lacks training and testing resul
Videos
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Emotion and Mood Recognition · Explainable Artificial Intelligence (XAI)
