3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
Hongyan Zhi, Peihao Chen, Siyuan Zhou, Yubo Dong, Quanxi Wu, Lei Han, Mingkui Tan

TL;DR
This paper introduces a cross-embodiment manipulation framework using a 3D flow world model trained on a large-scale dataset, enabling robots to generalize manipulation skills across diverse tasks and embodiments without hardware-specific training.
Contribution
The paper presents a novel 3D flow world model trained on ManiFlow-110k, integrating language-conditioned prediction and GPT-4o assessment for robust, cross-embodiment robotic manipulation.
Findings
Strong generalization across diverse tasks
Reliable cross-embodiment adaptation
Effective use of 3D optical flow for manipulation planning
Abstract
Manipulation has long been a challenging task for robots, while humans can effortlessly perform complex interactions with objects, such as hanging a cup on the mug rack. A key reason is the lack of a large and uniform dataset for teaching robots manipulation skills. Current robot datasets often record robot action in different action spaces within a simple scene. This hinders the robot to learn a unified and robust action representation for different robots within diverse scenes. Observing how humans understand a manipulation task, we find that understanding how the objects should move in the 3D space is a critical clue for guiding actions. This clue is embodiment-agnostic and suitable for both humans and different robots. Motivated by this, we aim to learn a 3D flow world model from both human and robot manipulation data. This model predicts the future movement of the interacting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
