Learning Generalizable 3D Manipulation With 10 Demonstrations
Yu Ren, Yang Cong, Ronghan Chen, Jiahao Long

TL;DR
This paper introduces a novel framework enabling robots to learn generalizable manipulation skills from only 10 demonstrations, effectively handling spatial variations and outperforming existing methods in success rates.
Contribution
The work presents a new approach combining semantic perception and diffusion-based decision-making, with a spatially equivariant training strategy for limited data scenarios.
Findings
60% improvement in success rates over state-of-the-art methods
Effective generalization across object poses and camera viewpoints
Validated on simulation and real-world robotic systems
Abstract
Learning robust and generalizable manipulation skills from demonstrations remains a key challenge in robotics, with broad applications in industrial automation and service robotics. While recent imitation learning methods have achieved impressive results, they often require large amounts of demonstration data and struggle to generalize across different spatial variants. In this work, we present a novel framework that learns manipulation skills from as few as 10 demonstrations, yet still generalizes to spatial variants such as different initial object positions and camera viewpoints. Our framework consists of two key modules: Semantic Guided Perception (SGP), which constructs task-focused, spatially aware 3D point cloud representations from RGB-D inputs; and Spatial Generalized Decision (SGD), an efficient diffusion-based decision-making module that generates actions via denoising. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
Methodstravel james · Attentive Walk-Aggregating Graph Neural Network
