TL;DR
BLaDA is an interpretable zero-shot framework that grounds open-vocabulary instructions for functional dexterous manipulation in unstructured environments, integrating semantic understanding, spatial reasoning, and pose execution.
Contribution
It introduces a novel structured reasoning chain and modules for pose-consistent spatial reasoning and control, enabling zero-shot functional manipulation without predefined labels.
Findings
Outperforms existing methods in affordance grounding precision.
Achieves higher success rates in diverse functional manipulation tasks.
Demonstrates effective open-vocabulary instruction grounding in complex benchmarks.
Abstract
In unstructured environments, functional dexterous grasping calls for the tight integration of semantic understanding, precise 3D functional localization, and physically interpretable execution. Modular hierarchical methods are more controllable and interpretable than end-to-end VLA approaches, but existing ones still rely on predefined affordance labels and lack the tight semantic--pose coupling needed for functional dexterous manipulation. To address this, we propose BLaDA (Bridging Language to Dexterous Actions in 3DGS fields), an interpretable zero-shot framework that grounds open-vocabulary instructions as perceptual and control constraints for functional dexterous manipulation. BLaDA establishes an interpretable reasoning chain by first parsing natural language into a structured sextuple of manipulation constraints via a Knowledge-guided Language Parsing (KLP) module. To achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
