Reward Learning from Multiple Feedback Types

Yannick Metz; Andr\'as Geiszl; Rapha\"el Baur; Mennatallah El-Assady

arXiv:2502.21038·cs.LG·March 3, 2025

Reward Learning from Multiple Feedback Types

Yannick Metz, Andr\'as Geiszl, Rapha\"el Baur, Mennatallah El-Assady

PDF

1 Repo 1 Video 3 Reviews

TL;DR

This paper explores learning reward models from multiple types of human feedback, demonstrating that diverse feedback sources can improve reward learning and downstream reinforcement learning performance.

Contribution

It introduces a framework for generating and utilizing six different feedback types, advancing reward learning beyond preference-based feedback.

Findings

01

Diverse feedback types enhance reward modeling accuracy.

02

Multi-type feedback improves RL performance compared to preference-only baselines.

03

Empirical evidence supports the potential of multi-source feedback in RLHF.

Abstract

Learning rewards from preference feedback has become an important tool in the alignment of agentic models. Preference-based feedback, often implemented as a binary comparison between multiple completions, is an established method to acquire large-scale human feedback. However, human feedback in other contexts is often much more diverse. Such diverse feedback can better support the goals of a human annotator, and the simultaneous use of multiple sources might be mutually informative for the learning process or carry type-dependent biases for the reward learning process. Despite these potential benefits, learning from different feedback types has yet to be explored extensively. In this paper, we bridge this gap by enabling experimentation and evaluating multi-type feedback in a broad set of environments. We present a process to generate high-quality simulated feedback of six different…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 5

Strengths

1.Studying different types of human feedback is extremely important and can promote research development in the RLHF community. 2.Provides methods for generating various types of synthetic feedback for future RLHF research. 3.For the first time, it proposes training with multiple feedback sources while considering human noise. 4.The various analyses of reward models in the main paper and the appendix are comprehensive.

Weaknesses

1.The workload of this paper is substantial, covering many key points, which results in relatively preliminary research on each type of feedback. The characteristics of different feedback types are not well demonstrated. Can you describe several key feedback types or explain which feedback types are more suitable for specific scenarios? 2.The first half of the paper is well-written, but the experimental organization in the latter half is chaotic, making it difficult to draw clear conclusio

Reviewer 02Rating 6Confidence 5

Strengths

With the development of RLHF, there has been an exponential growth of research, especially in interdisciplinary applications, such as large language models (LLMs). I appreciate that it is important to have a standard library of feedback types commonly found in established RL frameworks. This is undoubtedly helpful for both new and experienced researchers in this rapidly evolving field. The choice of RL methods (e.g., PPO) and environments used in the paper are standard and widely accepted in the

Weaknesses

1) Limited Analysis of Feedback Types and Noise Effects: The paper provides only a shallow analysis of the results across different feedback types and the impact of adding noise. The authors introduce Gaussian noise as a way to simulate realistic inconsistencies in human feedback, which is a valid approach. However, they assume that the added noise will uniformly challenge the agent’s learning process, yet they provide limited empirical support to demonstrate the nuanced effects of this noise. I

Reviewer 03Rating 3Confidence 5

Strengths

- The paper addresses an important problem, which is providing a toolkit/benchmark for people to use to research learning from different types of feedback and how to combine them. The approach is similar to what has already been used to learn from binary preference labels alone, which makes it an easy toolkit for people to pick up and understand how it works. - The experiments and results demonstrate that accounting for multiple source of feedback is neither straightforward nor trivial, and wor

Weaknesses

**High-level overview:** There are two main weaknesses of this paper discussed at a high level here, but more details below. The work is valuable and necessary, but the needed level of rigor isn't there yet. (1) the toolkit it not validated against any studies involving humans, therefore it is impossible to know if the conclusions drawn in the paper reflect characteristics of the feedback or of the implementation (2) the text in the paper, especially in the second half describing expe

Code & Models

Repositories

ymetz/multi-type-feedback
pytorchOfficial

Videos

Reward Learning from Multiple Feedback Types· slideslive

Taxonomy

MethodsSparse Evolutionary Training