Rethinking Transparent Object Grasping: Depth Completion with Monocular Depth Estimation and Instance Mask

Yaofeng Cheng; Xinkai Gao; Sen Zhang; Chao Zeng; Fusheng Zha; Lining Sun; Chenguang Yang

arXiv:2508.02507·cs.CV·March 26, 2026

Rethinking Transparent Object Grasping: Depth Completion with Monocular Depth Estimation and Instance Mask

Yaofeng Cheng, Xinkai Gao, Sen Zhang, Chao Zeng, Fusheng Zha, Lining Sun, Chenguang Yang

PDF

TL;DR

This paper introduces ReMake, a depth completion framework that uses instance masks and monocular depth estimation to improve the accuracy and generalization of transparent object grasping in robotic systems.

Contribution

The proposed ReMake framework explicitly distinguishes transparent regions to enhance depth completion and generalization, outperforming existing methods in real-world scenarios.

Findings

01

Outperforms existing approaches on benchmark datasets

02

Achieves higher accuracy in real-world transparent object grasping

03

Demonstrates improved generalization to complex lighting conditions

Abstract

Due to the optical properties, transparent objects often lead depth cameras to generate incomplete or invalid depth data, which in turn reduces the accuracy and reliability of robotic grasping. Existing approaches typically input the RGB-D image directly into the network to output the complete depth, expecting the model to implicitly infer the reliability of depth values. However, while effective in training datasets, such methods often fail to generalize to real-world scenarios, where complex light interactions lead to highly variable distributions of valid and invalid depth data. To address this, we propose ReMake, a novel depth completion framework guided by an instance mask and monocular depth estimation. By explicitly distinguishing transparent regions from non-transparent ones, the mask enables the model to concentrate on learning accurate depth estimation in these areas from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.