TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations

Shuaiyi Huang; Mara Levy; Anubhav Gupta; Daniel Ekpo; Ruijie Zheng; Abhinav Shrivastava

arXiv:2505.06079·cs.RO·May 12, 2025

TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations

Shuaiyi Huang, Mara Levy, Anubhav Gupta, Daniel Ekpo, Ruijie Zheng, Abhinav Shrivastava

PDF

Open Access

TL;DR

TREND introduces a tri-teaching framework that combines few-shot demonstrations with multiple reward models to robustly learn from noisy preference feedback in reinforcement learning tasks.

Contribution

It presents a novel tri-teaching approach that effectively mitigates noise using minimal expert demonstrations, advancing preference-based reinforcement learning.

Findings

01

Achieves up to 90% success rate in robotic tasks with high noise levels.

02

Requires only 1-3 expert demonstrations for effective learning.

03

Demonstrates robustness against 40% noisy preference feedback.

Abstract

Preference feedback collected by human or VLM annotators is often noisy, presenting a significant challenge for preference-based reinforcement learning that relies on accurate preference labels. To address this challenge, we propose TREND, a novel framework that integrates few-shot expert demonstrations with a tri-teaching strategy for effective noise mitigation. Our method trains three reward models simultaneously, where each model views its small-loss preference pairs as useful knowledge and teaches such useful pairs to its peer network for updating the parameters. Remarkably, our approach requires as few as one to three expert demonstrations to achieve high performance. We evaluate TREND on various robotic manipulation tasks, achieving up to 90% success rates even with noise levels as high as 40%, highlighting its effective robustness in handling noisy preference feedback. Project…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing