Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth
Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng, Zhao

TL;DR
This paper introduces a locally weighted linear regression approach to recover scale and shift in monocular video depth estimation, improving consistency and accuracy in 3D scene reconstruction.
Contribution
It proposes a novel method for scale and shift recovery in monocular video depth estimation, enhancing existing approaches and enabling robust 3D scene reconstruction.
Findings
Boosts state-of-the-art performance by up to 50% on benchmarks.
Trains a strong depth model surpassing DPT ViT-Large.
Enables accurate 3D scene shape recovery from monocular videos.
Abstract
Existing monocular depth estimation methods have achieved excellent robustness in diverse scenes, but they can only retrieve affine-invariant depth, up to an unknown scale and shift. However, in some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency. To solve this problem, we propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points, which ensures the scale consistency along consecutive frames. Extensive experiments show that our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks. Besides, we merge over 6.3 million RGBD images to train strong and robust depth models. Our produced ResNet50-backbone model even outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Convolution · Dense Connections · Residual Connection · Layer Normalization · Dense Prediction Transformer
