DiFuse-Net: RGB and Dual-Pixel Depth Estimation using Window Bi-directional Parallax Attention and Cross-modal Transfer Learning
Kunal Swami, Debtanu Gupta, Amrit Kumar Muduli, Chirag Jaiswal, Pankaj Kumar Bajpai

TL;DR
DiFuse-Net introduces a novel dual-modal depth estimation approach leveraging window bi-directional parallax attention and cross-modal transfer learning, achieving superior performance on a new high-quality RGB-DP-D dataset.
Contribution
The paper presents DiFuse-Net, a new network architecture for RGB and dual-pixel depth estimation, along with a novel attention mechanism and a large-scale dataset for training and evaluation.
Findings
DiFuse-Net outperforms baseline stereo and DP methods.
The proposed attention mechanism effectively captures DP disparity cues.
The new DCDP dataset enables robust training and benchmarking.
Abstract
Depth estimation is crucial for intelligent systems, enabling applications from autonomous navigation to augmented reality. While traditional stereo and active depth sensors have limitations in cost, power, and robustness, dual-pixel (DP) technology, ubiquitous in modern cameras, offers a compelling alternative. This paper introduces DiFuse-Net, a novel modality decoupled network design for disentangled RGB and DP based depth estimation. DiFuse-Net features a window bi-directional parallax attention mechanism (WBiPAM) specifically designed to capture the subtle DP disparity cues unique to smartphone cameras with small aperture. A separate encoder extracts contextual information from the RGB image, and these features are fused to enhance depth prediction. We also propose a Cross-modal Transfer Learning (CmTL) mechanism to utilize large-scale RGB-D datasets in the literature to cope with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Advanced Vision and Imaging · Video Surveillance and Tracking Methods
