Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation
Yuelei Li, Ge Yan, Annabella Macaluso, Mazeyu Ji, Xueyan Zou, Xiaolong, Wang

TL;DR
LMM-3DP is a framework that combines large multimodal model planners with 3D skill policies, enabling robots to perform complex manipulation tasks with improved accuracy and success rates in real-world environments.
Contribution
This work introduces LMM-3DP, a novel integration of LMM-based high-level planning with 3D feature field-based low-level control for robotic manipulation.
Findings
1.45x increase in low-level control success rate
1.5x improvement in high-level planning accuracy
Effective real-world kitchen environment performance
Abstract
The recent advancements in visual reasoning capabilities of large multimodal models (LMMs) and the semantic enrichment of 3D feature fields have expanded the horizons of robotic capabilities. These developments hold significant potential for bridging the gap between high-level reasoning from LMMs and low-level control policies utilizing 3D feature fields. In this work, we introduce LMM-3DP, a framework that can integrate LMM planners and 3D skill Policies. Our approach consists of three key perspectives: high-level planning, low-level control, and effective integration. For high-level planning, LMM-3DP supports dynamic scene understanding for environment disturbances, a critic agent with self-feedback, history policy memorization, and reattempts after failures. For low-level control, LMM-3DP utilizes a semantic-aware 3D feature field for accurate manipulation. In aligning high-level and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Robot Manipulation and Learning · Advanced Numerical Analysis Techniques
