PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward   Learning for Robotic Manipulation

Runze Liu; Yali Du; Fengshuo Bai; Jiafei Lyu; Xiu Li

arXiv:2306.03615·cs.LG·June 6, 2024·1 cites

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

Runze Liu, Yali Du, Fengshuo Bai, Jiafei Lyu, Xiu Li

PDF

Open Access

TL;DR

PEARL introduces a zero-shot transfer method for preference-based reinforcement learning in robotics, aligning preferences across tasks using optimal transport and robust reward modeling, reducing reliance on human labels.

Contribution

The paper presents a novel zero-shot transfer framework combining preference alignment via Gromov-Wasserstein and robust reward learning, enabling effective policy learning without target task labels.

Findings

01

Outperforms existing methods with limited human preferences

02

Accurately transfers preferences across diverse tasks

03

Learns well-behaved policies in robotic manipulation

Abstract

In preference-based Reinforcement Learning (RL), obtaining a large number of preference labels are both time-consuming and costly. Furthermore, the queried human preferences cannot be utilized for the new tasks. In this paper, we propose Zero-shot Cross-task Preference Alignment and Robust Reward Learning (PEARL), which learns policies from cross-task preference transfer without any human labels of the target task. Our contributions include two novel components that facilitate the transfer and learning process. The first is Cross-task Preference Alignment (CPA), which transfers the preferences between tasks via optimal transport. The key idea of CPA is to use Gromov-Wasserstein distance to align the trajectories between tasks, and the solved optimal transport matrix serves as the correspondence between trajectories. The target task preferences are computed as the weighted sum of source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Linear Layer · Dropout · Label Smoothing · Adam · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization