Pyramid Frequency Network with Spatial Attention Residual Refinement Module for Monocular Depth Estimation
Zhengyang Lu, Ying Chen

TL;DR
This paper introduces a Pyramid Frequency Network with a Spatial Attention Residual Refinement Module that enhances monocular depth estimation accuracy and robustness across various noise environments by leveraging multi-frequency features and attention mechanisms.
Contribution
The proposed PFN with SARRM is a novel deep-learning architecture that improves depth map detail and robustness in noisy conditions compared to existing methods.
Findings
Achieves superior visual accuracy on Make3D, KITTI, and NYUv2 datasets.
Demonstrates increased robustness in high-noise scenes.
Outperforms state-of-the-art methods in multiple benchmarks.
Abstract
Deep-learning-based approaches to depth estimation are rapidly advancing, offering superior performance over existing methods. To estimate the depth in real-world scenarios, depth estimation models require the robustness of various noise environments. In this work, a Pyramid Frequency Network(PFN) with Spatial Attention Residual Refinement Module(SARRM) is proposed to deal with the weak robustness of existing deep-learning methods. To reconstruct depth maps with accurate details, the SARRM constructs a residual fusion method with an attention mechanism to refine the blur depth. The frequency division strategy is designed, and the frequency pyramid network is developed to extract features from multiple frequency bands. With the frequency strategy, PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
