Task-oriented Robotic Manipulation with Vision Language Models

Nurhan Bulus Guran; Hanchi Ren; Jingjing Deng; Xianghua Xie

arXiv:2410.15863·cs.RO·May 21, 2025

Task-oriented Robotic Manipulation with Vision Language Models

Nurhan Bulus Guran, Hanchi Ren, Jingjing Deng, Xianghua Xie

PDF

Open Access

TL;DR

This paper introduces a novel framework combining Vision Language Models and structured spatial reasoning to improve robotic manipulation by better understanding spatial relationships and object attributes.

Contribution

The work presents a new integration of VLMs with a spatial reasoning pipeline and a dataset with annotated spatial and attribute information, advancing robot understanding of complex scenes.

Findings

01

Enhanced spatial relationship comprehension in robots

02

Improved object manipulation accuracy

03

First method integrating VLMs with structured spatial reasoning

Abstract

Vision Language Models (VLMs) play a crucial role in robotic manipulation by enabling robots to understand and interpret the visual properties of objects and their surroundings, allowing them to perform manipulation based on this multimodal understanding. Accurately understanding spatial relationships remains a non-trivial challenge, yet it is essential for effective robotic manipulation. In this work, we introduce a novel framework that integrates VLMs with a structured spatial reasoning pipeline to perform object manipulation based on high-level, task-oriented input. Our approach is the transformation of visual scenes into tree-structured representations that encode the spatial relations. These trees are subsequently processed by a Large Language Model (LLM) to infer restructured configurations that determine how these objects should be organised for a given high-level task. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Automated Systems · Robot Manipulation and Learning