Depth-PC: A Visual Servo Framework Integrated with Cross-Modality Fusion for Sim2Real Transfer
Haoyu Zhang, Yang Liu, Yimu Jiang, Weiyang Lin, Chao Ye

TL;DR
Depth-PC introduces a novel visual servoing framework that combines cross-modality feature fusion and graph neural networks, enabling zero-shot Sim2Real transfer and improved accuracy in robotic manipulation tasks.
Contribution
The paper presents the first integration of cross-modal feature fusion and graph neural networks for visual servoing, facilitating zero-shot Sim2Real transfer in robotic tasks.
Findings
Superior convergence basin and accuracy compared to SOTA methods
Effective cross-modality feature fusion for servo tasks
Zero-shot Sim2Real transfer achieved in experiments
Abstract
Visual servoing techniques guide robotic motion using visual information to accomplish manipulation tasks, requiring high precision and robustness against noise. Traditional methods often require prior knowledge and are susceptible to external disturbances. Learning-driven alternatives, while promising, frequently struggle with the scarcity of training data and fall short in generalization. To address these challenges, we propose Depth-PC, a novel visual servoing framework that leverages decoupled simulation-based training from real-world inference, achieving zero-shot Sim2Real transfer for servo tasks. To exploit spatial and geometric information of depth and point cloud features, we introduce cross-modal feature fusion, a first in servo tasks, followed by a dedicated Graph Neural Network to establish keypoint correspondences. Through simulation and real-world experiments, our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques
