Learning Diffusion Policy from Primitive Skills for Robot Manipulation
Zhihao Gu, Ming Yang, Difan Zou, Dong Xu

TL;DR
This paper introduces SDP, a diffusion policy framework that leverages primitive skills and visual-language cues to improve robot manipulation, demonstrating superior performance over existing methods in simulation and real-world tasks.
Contribution
The paper presents a novel skill-conditioned diffusion policy that integrates interpretable primitive skills with visual-language models for enhanced robot manipulation.
Findings
SDP outperforms state-of-the-art methods in simulation benchmarks.
SDP achieves effective real-world robot manipulation.
Decomposition into primitive skills improves task consistency.
Abstract
Diffusion policies (DP) have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as ``move up'' and ``open the gripper'', provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned DP that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on them, a lightweight router network is designed to assign a desired primitive skill for each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications
