DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation
Zhao Mandi, Yifan Hou, Dieter Fox, Yashraj Narang, Ajay Mandlekar, Shuran Song

TL;DR
DexMachina introduces a curriculum-based algorithm for learning dexterous, bimanual manipulation policies from human demonstrations, effectively handling complex tasks with articulated objects and large action spaces.
Contribution
The paper presents DexMachina, a novel approach using virtual object controllers with decaying strength, and provides a simulation benchmark for evaluating dexterous manipulation policies.
Findings
DexMachina outperforms baseline methods in diverse tasks.
The benchmark enables functional comparison of hardware designs.
The approach facilitates understanding hardware capabilities for dexterous manipulation.
Abstract
We study the problem of functional retargeting: learning dexterous manipulation policies to track object states from human hand-object demonstrations. We focus on long-horizon, bimanual tasks with articulated objects, which is challenging due to large action space, spatiotemporal discontinuities, and embodiment gap between human and robot hands. We propose DexMachina, a novel curriculum-based algorithm: the key idea is to use virtual object controllers with decaying strength: an object is first driven automatically towards its target states, such that the policy can gradually learn to take over under motion and contact guidance. We release a simulation benchmark with a diverse set of tasks and dexterous hands, and show that DexMachina significantly outperforms baseline methods. Our algorithm and benchmark enable a functional comparison for hardware designs, and we present key findings…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **High Domain Foresight (Dual-Arm Dexterous Hand)**: The research demonstrates strong foresight by tackling the manipulation of articulated objects using sophisticated dual-arm dexterous hands, addressing a critical and complex area of robot manipulation. 2. **Novel Curriculum-Based RL Framework**: The method introduces a curriculum-based Reinforcement Learning approach that successfully mitigates the inherent instability and local optima issues common in single-framework RL learning, offeri
1. **Lack of Real-Robot Validation**: The most significant drawback is the absence of real-robot experiments. Since comparable baselines like ObjDex and ManipTrans include hardware validation to support their claims, DexMachina's reliance solely on simulation limits the persuasiveness of its reported performance leadership. 2. **Reward Function Complexity and Generalization Risk**: The design utilizes numerous auxiliary reward components, which is common in RL but makes the method highly depend
The key insight is interesting -- using the curriculum to stabilize early training in long-horizon, contact-rich tasks via PD control on pose and articulation with reward-gated exponential decay. The long-horizon performance looks strong qualitatively, and it also outperforms non-curricular baseline to verify the insight The benchmark is comprehensive, it works across six robot hands, crossing different actuation and size.
The policy relies on a single demonstration and must be retrained for each task or object, making it impractical even for retargeting. The benchmark covers few objects and demonstrations, with no explicit tests on unseen objects, demonstrations, or cross-task transfer, only different hand-embodiment study is provided. The approach also assumes high-quality MANO-based hand motion data, which is often unrealistic in real-world capture settings, further limiting its applicability for practical reta
- Well-motivated method. Introducing the object virtual controller to ease the optimization problem is a well-motivated and interesting idea. - Thorough experimental studies. The authors conduct a series of experiments using different types of robot hands to validate the broad effectiveness of the proposed method. A detailed study that compares the functionality of various robot hands is conducted, which also provides valuable insights. - Good presentation.
- Limited effectiveness. From qualitative results, the effectiveness is only showcased on relatively easy examples with bulky objects. The effectiveness on harder examples, like those involving thin objects with the need to grasp from the table, is not demonstrated (like those shown in dextrack). Besides, the final hand motion is quite unnatural compared to prior works, including maniptrans and dextrack. - [Minor for ICLR submission] No real-world experiments. The effectiveness is not validated
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMotor Control and Adaptation · Muscle activation and electromyography studies · Virtual Reality Applications and Impacts
MethodsFocus · Sparse Evolutionary Training
