Integrating LMM Planners and 3D Skill Policies for Generalizable   Manipulation

Yuelei Li; Ge Yan; Annabella Macaluso; Mazeyu Ji; Xueyan Zou; Xiaolong; Wang

arXiv:2501.18733·cs.RO·February 3, 2025

Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation

Yuelei Li, Ge Yan, Annabella Macaluso, Mazeyu Ji, Xueyan Zou, Xiaolong, Wang

PDF

Open Access

TL;DR

LMM-3DP is a framework that combines large multimodal model planners with 3D skill policies, enabling robots to perform complex manipulation tasks with improved accuracy and success rates in real-world environments.

Contribution

This work introduces LMM-3DP, a novel integration of LMM-based high-level planning with 3D feature field-based low-level control for robotic manipulation.

Findings

01

1.45x increase in low-level control success rate

02

1.5x improvement in high-level planning accuracy

03

Effective real-world kitchen environment performance

Abstract

The recent advancements in visual reasoning capabilities of large multimodal models (LMMs) and the semantic enrichment of 3D feature fields have expanded the horizons of robotic capabilities. These developments hold significant potential for bridging the gap between high-level reasoning from LMMs and low-level control policies utilizing 3D feature fields. In this work, we introduce LMM-3DP, a framework that can integrate LMM planners and 3D skill Policies. Our approach consists of three key perspectives: high-level planning, low-level control, and effective integration. For high-level planning, LMM-3DP supports dynamic scene understanding for environment disturbances, a critic agent with self-feedback, history policy memorization, and reattempts after failures. For low-level control, LMM-3DP utilizes a semantic-aware 3D feature field for accurate manipulation. In aligning high-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · Robot Manipulation and Learning · Advanced Numerical Analysis Techniques