T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models

Yiteng Chen; Wenbo Li; Shiyi Wang; Huiping Zhuang; Qingyao Wu

arXiv:2506.19498·cs.RO·June 25, 2025

T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models

Yiteng Chen, Wenbo Li, Shiyi Wang, Huiping Zhuang, Qingyao Wu

PDF

Open Access

TL;DR

T-Rex is a framework that dynamically adapts spatial representation extraction in robotic manipulation tasks using vision-language models, improving efficiency and understanding without extra training.

Contribution

It introduces a task-adaptive scheme for spatial representation extraction, addressing limitations of fixed methods in VLM-based robotic manipulation.

Findings

01

Enhanced spatial understanding in real-world robots

02

Improved efficiency and stability without additional training

03

Effective adaptation to task complexity

Abstract

Building a general robotic manipulation system capable of performing a wide variety of tasks in real-world settings is a challenging task. Vision-Language Models (VLMs) have demonstrated remarkable potential in robotic manipulation tasks, primarily due to the extensive world knowledge they gain from large-scale datasets. In this process, Spatial Representations (such as points representing object positions or vectors representing object orientations) act as a bridge between VLMs and real-world scene, effectively grounding the reasoning abilities of VLMs and applying them to specific task scenarios. However, existing VLM-based robotic approaches often adopt a fixed spatial representation extraction scheme for various tasks, resulting in insufficient representational capability or excessive extraction time. In this work, we introduce T-Rex, a Task-Adaptive Framework for Spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Automated Systems · Robot Manipulation and Learning