DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning
Junha Lee, Eunha Park, Minsu Cho

TL;DR
DextER introduces a novel embodied reasoning approach for language-driven dexterous grasp generation, predicting contact points to improve physical plausibility and control in multi-finger manipulation.
Contribution
The paper proposes contact-based embodied reasoning with an autoregressive model to enhance grasp generation from language instructions, outperforming existing methods.
Findings
Achieves 67.14% success rate on DexGYS, surpassing state-of-the-art by 3.83 percentage points.
Provides 96.4% improvement in intention alignment over previous approaches.
Enables steerable grasp generation through partial contact specification.
Abstract
Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation, bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
