Scale Propagation Network for Generalizable Depth Completion
Haotian Wang, Meng Yang, Xinhu Zheng, and Gang Hua

TL;DR
This paper introduces a novel scale propagation normalization (SP-Norm) for depth completion, enabling models to generalize better across unseen scenes by propagating scale information and leveraging a new architecture based on ConvNeXt V2.
Contribution
The authors propose SP-Norm to improve depth completion generalization and develop a new architecture with ConvNeXt V2, achieving superior performance on diverse unseen datasets.
Findings
Outperforms state-of-the-art methods in accuracy
Faster inference with lower memory usage
Effective across various sparse depth map types
Abstract
Depth completion, inferring dense depth maps from sparse measurements, is crucial for robust 3D perception. Although deep learning based methods have made tremendous progress in this problem, these models cannot generalize well across different scenes that are unobserved in training, posing a fundamental limitation that yet to be overcome. A careful analysis of existing deep neural network architectures for depth completion, which are largely borrowing from successful backbones for image analysis tasks, reveals that a key design bottleneck actually resides in the conventional normalization layers. These normalization layers are designed, on one hand, to make training more stable, on the other hand, to build more visual invariance across scene scales. However, in depth completion, the scale is actually what we want to robustly estimate in order to better generalize to unseen scenes. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsConvNeXt · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
