Binary Reward Labeling: Bridging Offline Preference and Reward-Based   Reinforcement Learning

Yinglun Xu; David Zhu; Rohan Gumaste; Gagandeep Singh

arXiv:2406.10445·cs.LG·October 25, 2024

Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

Yinglun Xu, David Zhu, Rohan Gumaste, Gagandeep Singh

PDF

Open Access

TL;DR

This paper introduces a framework that converts preference feedback into scalar rewards using binary reward labeling, enabling the application of reward-based offline RL algorithms to preference-based data, thus bridging a significant gap in offline RL research.

Contribution

The authors propose a universal framework that transforms preference feedback into scalar rewards, allowing existing reward-based offline RL algorithms to be used for preference-based data, with theoretical and empirical validation.

Findings

01

Framework achieves comparable results to reward-based RL on benchmark datasets.

02

Binary reward labeling minimizes information loss during feedback transition.

03

Combining reward labeling with various algorithms improves offline preference RL performance.

Abstract

Offline reinforcement learning has become one of the most practical RL settings. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich understanding of offline RL from the reward-based to the preference-based setting. In this work, we propose a general framework to bridge this gap. Our key insight is transforming preference feedback to scalar rewards via binary reward labeling (BRL), and then any reward-based offline RL algorithms can be applied to the dataset with the reward labels. The information loss during the feedback signal transition is minimized with binary reward labeling in the practical learning scenarios. We theoretically show the connection between several recent PBRL techniques and our framework combined with specific offline RL algorithms. By combining reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural and Behavioral Psychology Studies · Behavioral Health and Interventions · Pharmacological Effects and Assays

MethodsFocus