CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

Rongjia Yu; Tong Jia; Hao Wang; Xiaofang Li; Xiao Yang; Zinuo Zhang; Cuiwei Liu

arXiv:2604.11097·cs.CV·April 14, 2026

CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

Rongjia Yu, Tong Jia, Hao Wang, Xiaofang Li, Xiao Yang, Zinuo Zhang, Cuiwei Liu

PDF

TL;DR

This paper introduces CDPR, a diffusion-based framework that combines RGB and polarization data to improve monocular depth estimation, especially in challenging visual conditions.

Contribution

It integrates polarization priors into a diffusion model using a shared latent space and confidence-aware fusion, enhancing robustness over RGB-only methods.

Findings

01

Outperforms RGB-only baselines in challenging regions

02

Effective in handling reflective and transparent surfaces

03

Generalizes to surface normal prediction with minimal changes

Abstract

Monocular depth estimation is a fundamental yet challenging task in computer vision, especially under complex conditions such as textureless surfaces, transparency, and specular reflections. Recent diffusion-based approaches have significantly advanced performance by reformulating depth prediction as a denoising process in the latent space. However, existing methods rely solely on RGB inputs, which often lack sufficient cues in challenging regions. In this work, we present CDPR - Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation - a novel diffusion-based framework that integrates physically grounded polarization priors to enhance estimation robustness. Specifically, we encode both RGB and polarization (AoLP/DoLP) images into a shared latent space via a pre-trained Variational Autoencoder (VAE), and dynamically fuse multi-modal information through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.