Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi, Zhiyuan Zhang, Yu Zhao, Muzhi Han, Zeyu Zhang, Zhitian, Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang

TL;DR
COME-robot leverages GPT-4V for open-vocabulary perception and closed-loop reasoning, enabling autonomous robots to adaptively plan and recover from failures in complex real-world manipulation tasks.
Contribution
This work introduces COME-robot, the first system to integrate GPT-4V for real-time open-ended reasoning and adaptive planning in robotic manipulation.
Findings
35% improvement in task success rate over state-of-the-art methods
Effective failure recovery and long-horizon planning demonstrated
Robust open-vocabulary perception in real-world scenarios
Abstract
Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios.COME-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · AI in Service Interactions
MethodsLib
