Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images

Ziteng Liu; Dongdong He; Chenghong Zhang; Wenpeng Gao; Yili Fu

arXiv:2505.08178·cs.CV·May 14, 2025

Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images

Ziteng Liu, Dongdong He, Chenghong Zhang, Wenpeng Gao, Yili Fu

PDF

TL;DR

This paper introduces DGORNet, a semi-supervised learning approach that refines disparity maps in laparoscopic images by utilizing monocular depth, spatial context, and temporal information, significantly improving accuracy in occluded and texture-less regions.

Contribution

The study presents a novel disparity refinement network incorporating monocular depth guidance, position embedding, and optical flow-based loss, addressing occlusion and data scarcity in surgical stereo images.

Findings

01

Outperforms state-of-the-art in EPE and RMSE metrics

02

Effective in occlusion and texture-less regions

03

Ablation confirms importance of PE and OFDLoss

Abstract

Occlusion and the scarcity of labeled surgical data are significant challenges in disparity estimation for stereo laparoscopic images. To address these issues, this study proposes a Depth Guided Occlusion-Aware Disparity Refinement Network (DGORNet), which refines disparity maps by leveraging monocular depth information unaffected by occlusion. A Position Embedding (PE) module is introduced to provide explicit spatial context, enhancing the network's ability to localize and refine features. Furthermore, we introduce an Optical Flow Difference Loss (OFDLoss) for unlabeled data, leveraging temporal continuity across video frames to improve robustness in dynamic surgical scenes. Experiments on the SCARED dataset demonstrate that DGORNet outperforms state-of-the-art methods in terms of End-Point Error (EPE) and Root Mean Squared Error (RMSE), particularly in occlusion and texture-less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.