Depth Helps: Improving Pre-trained RGB-based Policy with Depth   Information Injection

Xincheng Pang; Wenke Xia; Zhigang Wang; Bin Zhao; Di Hu; Dong Wang,; Xuelong Li

arXiv:2408.05107·cs.RO·August 12, 2024

Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

Xincheng Pang, Wenke Xia, Zhigang Wang, Bin Zhao, Di Hu, Dong Wang,, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces a Depth Information Injection framework that enhances pre-trained RGB-based robotic manipulation policies with 3D perception by using depth data during fine-tuning and virtual depth generation during deployment.

Contribution

The proposed DI^2 framework integrates depth information into RGB-based policies through a depth completion module and a depth-aware codebook, improving manipulation performance.

Findings

01

Enhanced manipulation accuracy in simulated environments

02

Effective transfer to real-world scenarios

03

Improved 3D perception in pre-trained policies

Abstract

3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection ( $DI^{2}$ ) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Machine Learning and Data Classification