TL;DR
ViserDex introduces a sim-to-real framework using 3D Gaussian Splatting and domain randomization for robust monocular RGB in-hand object reorientation, enabling effective policy training on consumer hardware.
Contribution
The paper presents a novel sim-to-real approach with Gaussian Splatting and curriculum reinforcement learning for dexterous manipulation, reducing hardware and computational requirements.
Findings
Outperforms conventional rendering-based pose estimators in challenging environments.
Successfully reorients diverse objects with a multi-fingered hand under difficult lighting.
Perception and control models trained independently on consumer-grade hardware.
Abstract
In-hand object reorientation requires precise estimation of the object pose to handle complex task dynamics. While RGB sensing offers rich semantic cues for pose tracking, existing solutions rely on multi-camera setups or costly ray tracing. We present a sim-to-real framework for monocular RGB in-hand reorientation that integrates 3D Gaussian Splatting (3DGS) to bridge the visual sim-to-real gap. Our key insight is performing domain randomization in the Gaussian representation space: by applying physically consistent, pre-rendering augmentations to 3D Gaussians, we generate photorealistic, randomized visual data for object pose estimation. The manipulation policy is trained using curriculum-based reinforcement learning with teacher-student distillation, enabling efficient learning of complex behaviors. Importantly, both perception and control models can be trained independently on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
