SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces
Wensheng Wang, Chuanjun Guo, Wei Wei, Tong Wu, and Ning Tan

TL;DR
SpaceDex introduces a hierarchical framework combining vision-language planning and feature separation to improve dexterous grasping in constrained 3D environments, achieving significant success rate improvements over baseline methods.
Contribution
The paper presents a novel hierarchical approach with an arm-hand feature separation network and multi-view perception for generalizable grasping in tiered workspaces.
Findings
Achieved 63.0% success rate in real-world trials with unseen objects.
Outperformed a strong tabletop baseline with 39.0% success rate.
Demonstrated robustness to partial observability and off-nominal contacts.
Abstract
Generalizable grasping with high-degree-of-freedom (DoF) dexterous hands remains challenging in tiered workspaces, where occlusion, narrow clearances, and height-dependent constraints are substantially stronger than in open tabletop scenes. Most existing methods are evaluated in relatively unoccluded settings and typically do not explicitly model the distinct control requirements of arm navigation and hand articulation under spatial constraints. We present SpaceDex, a hierarchical framework for dexterous manipulation in constrained 3D environments. At the high level, a Vision-Language Model (VLM) planner parses user intent, reasons about occlusion and height relations across multiple camera views, and generates target bounding boxes for zero-shot segmentation and mask tracking. This stage provides structured spatial guidance for downstream control instead of relying on single-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
