Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice
Junliang Li, Kai Ye, Haolan Kang, Mingxuan Liang, Yuhang Wu, Zhenhua, Liu, Huiping Zhuang, Rui Huang, Yongquan Chen

TL;DR
This paper presents EDGS, a novel voice-driven robotic grasping system that integrates vision-language models and human-inspired grasping strategies to improve object interaction in cluttered environments.
Contribution
The paper introduces a new embodied grasping system that combines semantic-object alignment with advanced manipulation techniques driven by voice commands.
Findings
High success rate in complex grasping tasks
Effective integration of voice commands with visual data
Robust and precise grasping strategy demonstrated
Abstract
In recent years, as robotics has advanced, human-robot collaboration has gained increasing importance. However, current robots struggle to fully and accurately interpret human intentions from voice commands alone. Traditional gripper and suction systems often fail to interact naturally with humans, lack advanced manipulation capabilities, and are not adaptable to diverse tasks, especially in unstructured environments. This paper introduces the Embodied Dexterous Grasping System (EDGS), designed to tackle object grasping in cluttered environments for human-robot interaction. We propose a novel approach to semantic-object alignment using a Vision-Language Model (VLM) that fuses voice commands and visual information, significantly enhancing the alignment of multi-dimensional attributes of target objects in complex scenarios. Inspired by human hand-object interactions, we develop a robust,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Evolutionary Algorithms and Applications
