TL;DR
This paper introduces an attention-based hierarchical multi-modal fusion network that significantly improves guided depth map super-resolution by effectively combining low-resolution depth and high-resolution RGB guidance, outperforming existing methods.
Contribution
The paper proposes a novel AHMF network with MMAF and BHFC modules for better feature extraction and fusion in guided DSR, addressing structure consistency and propagation challenges.
Findings
Outperforms state-of-the-art in accuracy
Faster processing speed
More memory-efficient
Abstract
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a high-resolution (HR) depth map from the input low-resolution (LR) depth and its coupled HR RGB image that serves as the guidance. The most challenging problems for guided DSR are how to correctly select consistent structures and propagate them, and properly handle inconsistent ones. In this paper, we propose a novel attention-based hierarchical multi-modal fusion (AHMF) network for guided DSR. Specifically, to effectively extract and combine relevant information from LR depth and HR guidance, we propose a multi-modal attention based fusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
