Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
Xincheng Pang, Wenke Xia, Zhigang Wang, Bin Zhao, Di Hu, Dong Wang,, Xuelong Li

TL;DR
This paper introduces a Depth Information Injection framework that enhances pre-trained RGB-based robotic manipulation policies with 3D perception by using depth data during fine-tuning and virtual depth generation during deployment.
Contribution
The proposed DI^2 framework integrates depth information into RGB-based policies through a depth completion module and a depth-aware codebook, improving manipulation performance.
Findings
Enhanced manipulation accuracy in simulated environments
Effective transfer to real-world scenarios
Improved 3D perception in pre-trained policies
Abstract
3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection () framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Machine Learning and Data Classification
