QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping
Xinyu Chen, Jian Yang, Zonghan He, Haobin Yang, Qi Zhao, Yuhui Shi

TL;DR
QwenGrasp leverages a large vision-language model combined with a neural network to enable robots to understand natural language instructions and perform precise target-oriented 6-DoF grasping in unstructured scenes, improving safety and accuracy.
Contribution
This paper introduces QwenGrasp, a novel model integrating vision-language understanding with grasping neural networks for flexible, instruction-based robotic grasping.
Findings
QwenGrasp accurately grasps target objects based on vague or descriptive instructions.
The model can suspend tasks and provide feedback when instructions are infeasible or irrelevant.
QwenGrasp demonstrates superior comprehension of human intentions in complex scenes.
Abstract
Target-oriented grasping in unstructured scenes with language control is essential for intelligent robot arm grasping. The ability for the robot arm to understand the human language and execute corresponding grasping actions is a pivotal challenge. In this paper, we propose a combination model called QwenGrasp which combines a large vision-language model with a 6-DoF grasp neural network. QwenGrasp is able to conduct a 6-DoF grasping task on the target object with textual language instruction. We design a complete experiment with six-dimension instructions to test the QwenGrasp when facing with different cases. The results show that QwenGrasp has a superior ability to comprehend the human intention. Even in the face of vague instructions with descriptive words or instructions with direction information, the target object can be grasped accurately. When QwenGrasp accepts the instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Robot Manipulation and Learning
