LLM-GROP: Visually Grounded Robot Task and Motion Planning with Large Language Models

Xiaohan Zhang; Yan Ding; Yohei Hayamizu; Zainab Altaweel; Yifeng Zhu; Yuke Zhu; Peter Stone; Chris Paxton; Shiqi Zhang

arXiv:2511.07727·cs.RO·November 12, 2025

LLM-GROP: Visually Grounded Robot Task and Motion Planning with Large Language Models

Xiaohan Zhang, Yan Ding, Yohei Hayamizu, Zainab Altaweel, Yifeng Zhu, Yuke Zhu, Peter Stone, Chris Paxton, Shiqi Zhang

PDF

Open Access

TL;DR

This paper introduces LLM-GROP, a framework that combines large language models and computer vision to improve task and motion planning for mobile manipulation tasks involving multiple objects, especially in complex, real-world scenarios.

Contribution

It presents a novel TAMP framework leveraging LLMs and vision to handle common sense reasoning and adaptive planning in multi-object rearrangement tasks.

Findings

01

Achieved 84.4% success rate in real-world object rearrangement tasks.

02

Demonstrated effective integration of LLMs and vision for adaptive planning.

03

Showed potential for improved robot performance in complex environments.

Abstract

Task planning and motion planning are two of the most important problems in robotics, where task planning methods help robots achieve high-level goals and motion planning methods maintain low-level feasibility. Task and motion planning (TAMP) methods interleave the two processes of task planning and motion planning to ensure goal achievement and motion feasibility. Within the TAMP context, we are concerned with the mobile manipulation (MoMa) of multiple objects, where it is necessary to interleave actions for navigation and manipulation. In particular, we aim to compute where and how each object should be placed given underspecified goals, such as ``set up dinner table with a fork, knife and plate.'' We leverage the rich common sense knowledge from large language models (LLMs), e.g., about how tableware is organized, to facilitate both task-level and motion-level planning. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robot Manipulation and Learning