DeepV2D: Video to Depth with Differentiable Structure from Motion
Zachary Teed, Jia Deng

TL;DR
DeepV2D introduces an end-to-end trainable deep learning framework that integrates classical geometric algorithms for video-based depth estimation, effectively combining neural networks with geometric principles.
Contribution
It presents a novel architecture that unifies geometric algorithms with deep learning for improved depth prediction from video sequences.
Findings
Achieves accurate depth estimation through alternating motion and depth inference.
Successfully integrates classical geometry algorithms into a trainable deep learning model.
Provides an end-to-end differentiable system for video to depth conversion.
Abstract
We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Optical measurement and interference techniques
