STRAPPER: Preference-based Reinforcement Learning via Self-training   Augmentation and Peer Regularization

Yachen Kang; Li He; Jinxin Liu; Zifeng Zhuang; Donglin Wang

arXiv:2307.09692·cs.LG·July 20, 2023

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces STRAPPER, a novel semi-supervised reinforcement learning method that addresses the 'similarity trap' in preference-based RL by combining self-training and peer regularization, improving reward learning for complex behaviors.

Contribution

It proposes a new approach to preference-based RL that overcomes the similarity trap using self-training and peer regularization, enhancing reward learning with less human effort.

Findings

01

Effective in learning locomotion behaviors

02

Improves reward confidence in semi-supervised settings

03

Addresses the similarity trap issue

Abstract

Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference. However, such human-in-the-loop formulation requires considerable human effort to assign preference labels to segment pairs, hindering its large-scale applications. Recent approache has tried to reuse unlabeled segments, which implicitly elucidates the distribution of segments and thereby alleviates the human effort. And consistency regularization is further considered to improve the performance of semi-supervised learning. However, we notice that, unlike general classification tasks, in PbRL there exits a unique phenomenon that we defined as similarity trap in this paper. Intuitively, human can have diametrically opposite preferredness for similar segment pairs, but such similarity may trap consistency regularization fail in PbRL. Due to the existence of similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rll-research/bpref
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies

Methodsfail