DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

Junha Lee; Eunha Park; Minsu Cho

arXiv:2601.16046·cs.RO·April 28, 2026

DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

Junha Lee, Eunha Park, Minsu Cho

PDF

TL;DR

DextER introduces a novel embodied reasoning approach for language-driven dexterous grasp generation, predicting contact points to improve physical plausibility and control in multi-finger manipulation.

Contribution

The paper proposes contact-based embodied reasoning with an autoregressive model to enhance grasp generation from language instructions, outperforming existing methods.

Findings

01

Achieves 67.14% success rate on DexGYS, surpassing state-of-the-art by 3.83 percentage points.

02

Provides 96.4% improvement in intention alignment over previous approaches.

03

Enables steerable grasp generation through partial contact specification.

Abstract

Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation, bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.