Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning
Huy Hoang Nguyen, Minh Nhat Vu, Florian Beck, Gerald Ebmer, Anh, Nguyen, Andreas Kugi

TL;DR
This paper introduces a modular, zero-shot framework for language-guided robotic manipulation that enables real-time tracking and trajectory replanning to grasp moving objects smoothly in dynamic environments.
Contribution
The work presents a novel modular framework integrating vision-language models with real-time pose estimation and trajectory planning for dynamic object grasping.
Findings
Achieves up to 30 Hz update rate for pose localization.
Operates at 10 Hz for trajectory optimization.
Successfully grasps moving objects in real-time.
Abstract
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics
