DexSim2Real: Foundation Model-Guided Sim-to-Real Transfer for Generalizable Dexterous Manipulation

Zijian Zeng; Fei Ding; Huiming Yang; Xianwei Li; and Yuhao Liao

arXiv:2605.05241·cs.RO·May 8, 2026

DexSim2Real: Foundation Model-Guided Sim-to-Real Transfer for Generalizable Dexterous Manipulation

Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li, and Yuhao Liao

PDF

TL;DR

DexSim2Real introduces a novel framework leveraging foundation models for improved sim-to-real transfer in dexterous manipulation, combining visual realism, cross-modal policies, and curriculum learning.

Contribution

The paper presents an integrated approach using foundation models for domain randomization, visuo-tactile policy fusion, and task curriculum to enhance transferability across diverse manipulation tasks.

Findings

01

Achieves 78.2% success rate on real robots across six tasks.

02

Reduces sim-to-real gap to 8.3%, outperforming prior methods.

03

Demonstrates effectiveness of foundation model-guided domain randomization.

Abstract

Sim-to-real transfer remains a critical bottleneck for deploying dexterous manipulation policies learned in simulation to real-world robots. Existing approaches rely on manually designed domain randomization or task-specific adaptation, limiting their generalizability across diverse manipulation scenarios. We present DexSim2Real, an integrated framework that leverages vision-language foundation models to bridge the sim-to-real gap for dexterous manipulation. Our system combines three components: (1) Foundation Model-Guided Domain Randomization (FM-DR), which uses a vision-language model as a visual realism critic to optimize simulation parameters via closed-loop CMA-ES, complementing text-based approaches like DrEureka with direct visual feedback; (2) a Tactile-Visual Cross-Attention Policy (TVCAP) that adapts cross-attention visuo-tactile fusion to zero-shot sim-to-real RL; and (3) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.