PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Brahim Driss; Alex Davey; Riad Akrour

arXiv:2506.13741·cs.AI·June 17, 2025

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning

Brahim Driss, Alex Davey, Riad Akrour

PDF

Open Access

TL;DR

This paper introduces a population-based approach to preference space exploration in preference-based reinforcement learning, improving diversity, robustness, and efficiency in learning from human feedback.

Contribution

It proposes a novel population-based method that enhances exploration and robustness in PbRL, addressing limitations of existing single-agent approaches.

Findings

01

Enhanced preference exploration with diverse agent populations.

02

Improved robustness to human feedback errors.

03

Better performance in complex reward environments.

Abstract

Preference-based reinforcement learning (PbRL) has emerged as a promising approach for learning behaviors from human feedback without predefined reward functions. However, current PbRL methods face a critical challenge in effectively exploring the preference space, often converging prematurely to suboptimal policies that satisfy only a narrow subset of human preferences. In this work, we identify and address this preference exploration problem through population-based methods. We demonstrate that maintaining a diverse population of agents enables more comprehensive exploration of the preference landscape compared to single-agent approaches. Crucially, this diversity improves reward model learning by generating preference queries with clearly distinguishable behaviors, a key factor in real-world scenarios where humans must easily differentiate between options to provide meaningful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColor perception and design · Data Management and Algorithms · Evolutionary Algorithms and Applications