RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Charles Xu, Qiyang Li, Jianlan Luo, and Sergey Levine

TL;DR
This paper introduces RLDG, a reinforcement learning-based method for generating high-quality training data to improve generalist robotic policies, resulting in significant performance gains in manipulation tasks.
Contribution
The paper presents a novel approach combining reinforcement learning with policy distillation to enhance robotic generalist policies beyond traditional methods.
Findings
RLDG-trained policies outperform human demonstration-based policies by up to 40% success rate.
Reinforcement learning data improves generalization to new tasks.
Optimized action distributions and better state coverage drive performance improvements.
Abstract
Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
