Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios
Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng,, Xiangyang Ji

TL;DR
This paper introduces a multi-modal fusion framework that combines RGB and infrared data to improve monocular depth estimation in challenging environments like nighttime and adverse weather, leveraging confidence-guided fusion for robustness.
Contribution
It proposes a novel multi-modal depth estimation approach that independently computes depth maps, predicts confidence, and fuses modalities end-to-end, enhancing accuracy in difficult scenarios.
Findings
Effective depth estimation in challenging conditions
Robust performance on MS$^2$ and ViViD++ datasets
Outperforms single-modality methods
Abstract
Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods focus solely on a single modality due to the difficulties to identify and integrate faithful depth cues from both sources. To address these issues, this paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. Concretely, we independently compute the coarse depth maps with separate networks by fully utilizing the individual depth cues from each modality. As the advantageous depth spreads across both modalities, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration
MethodsFocus
