Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
Heewoong Choi, Sangwon Jung, Hongjoon Ahn, Taesup Moon

TL;DR
This paper introduces LiRE, a novel offline preference-based reinforcement learning method that utilizes second-order preference information through ranked lists of trajectories, improving reward estimation accuracy and robustness.
Contribution
LiRE is the first approach to incorporate second-order preferences in offline PbRL, enhancing reward estimation by constructing ranked trajectory lists using ternary feedback.
Findings
LiRE outperforms state-of-the-art baselines in experiments.
LiRE is robust to feedback noise and varying feedback quantities.
A new offline PbRL dataset was proposed for evaluation.
Abstract
In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing · Wine Industry and Tourism
