Listwise Reward Estimation for Offline Preference-based Reinforcement   Learning

Heewoong Choi; Sangwon Jung; Hongjoon Ahn; Taesup Moon

arXiv:2408.04190·cs.LG·August 9, 2024

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

Heewoong Choi, Sangwon Jung, Hongjoon Ahn, Taesup Moon

PDF

Open Access 1 Repo

TL;DR

This paper introduces LiRE, a novel offline preference-based reinforcement learning method that utilizes second-order preference information through ranked lists of trajectories, improving reward estimation accuracy and robustness.

Contribution

LiRE is the first approach to incorporate second-order preferences in offline PbRL, enhancing reward estimation by constructing ranked trajectory lists using ternary feedback.

Findings

01

LiRE outperforms state-of-the-art baselines in experiments.

02

LiRE is robust to feedback noise and varying feedback quantities.

03

A new offline PbRL dataset was proposed for evaluation.

Abstract

In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chwoong/lire
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConsumer Market Behavior and Pricing · Wine Industry and Tourism