Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards   for Visuomotor Robot Policy Alignment

Ran Tian; Yilin Wu; Chenfeng Xu; Masayoshi Tomizuka; Jitendra Malik,; and Andrea Bajcsy

arXiv:2412.04835·cs.RO·December 9, 2024

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

Ran Tian, Yilin Wu, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik,, and Andrea Bajcsy

PDF

Open Access

TL;DR

This paper introduces RAPL, a method that efficiently learns visual rewards for robot policies using minimal human preference feedback, significantly reducing the data needed for effective alignment.

Contribution

RAPL is a novel observation-only approach that fine-tunes pre-trained vision encoders to align with user preferences, enabling reward learning with less human feedback.

Findings

01

RAPL outperforms traditional methods in simulation and real robot tasks.

02

It reduces human preference data requirements by 5x.

03

It generalizes across different robot embodiments.

Abstract

Visuomotor robot policies, increasingly pre-trained on large-scale datasets, promise significant advancements across robotics domains. However, aligning these policies with end-user preferences remains a challenge, particularly when the preferences are hard to specify. While reinforcement learning from human feedback (RLHF) has become the predominant mechanism for alignment in non-embodied domains like large language models, it has not seen the same success in aligning visuomotor policies due to the prohibitive amount of human feedback required to learn visual reward functions. To address this limitation, we propose Representation-Aligned Preference-based Learning (RAPL), an observation-only method for learning visual rewards from significantly less human preference feedback. Unlike traditional RLHF, RAPL focuses human feedback on fine-tuning pre-trained vision encoders to align with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning

MethodsDiffusion · ALIGN