RT-Grasp: Reasoning Tuning Robotic Grasping via Multi-modal Large Language Model
Jinxuan Xu, Shiyu Jin, Yutian Lei, Yuqian Zhang, Liangjun Zhang

TL;DR
This paper introduces Reasoning Tuning, a method that enables multi-modal Large Language Models to generate accurate, context-aware grasp poses for robotics by integrating reasoning phases into training, expanding LLMs' applicability in robot control.
Contribution
The paper proposes Reasoning Tuning, a novel training approach that leverages LLMs' reasoning abilities for numerical predictions in robotic grasping tasks, supported by a new dataset.
Findings
Validated on grasping datasets and real-world experiments.
Demonstrated improved accuracy and adaptability of LLMs in robotic grasping.
Bridged the gap between text-based planning and robot control.
Abstract
Recent advances in Large Language Models (LLMs) have showcased their remarkable reasoning capabilities, making them influential across various fields. However, in robotics, their use has primarily been limited to manipulation planning tasks due to their inherent textual output. This paper addresses this limitation by investigating the potential of adopting the reasoning ability of LLMs for generating numerical predictions in robotics tasks, specifically for robotic grasping. We propose Reasoning Tuning, a novel method that integrates a reasoning phase before prediction during training, leveraging the extensive prior knowledge and advanced reasoning abilities of LLMs. This approach enables LLMs, notably with multi-modal capabilities, to generate accurate numerical outputs like grasp poses that are context-aware and adaptable through conversations. Additionally, we present the Reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Robot Manipulation and Learning · Multimodal Machine Learning Applications
