End-to-end depth from motion with stabilized monocular videos
Cl\'ement Pinard, Laure Chevalley, Antoine Manzanera, David, Filliat

TL;DR
This paper introduces a monocular video-based depth inference system using a new dataset that simulates stabilized aerial footage, demonstrating effective depth prediction in rigid scenes with a fully convolutional network.
Contribution
The paper presents a novel dataset and an end-to-end convolutional architecture for depth inference from stabilized monocular videos, simplifying the structure from motion problem.
Findings
Effective depth prediction in stabilized monocular videos
Locally solvable problem tied to camera parameters
Good quality depth maps achieved
Abstract
We propose a depth map inference system from monocular videos based on a novel dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Unlike most navigation datasets, the lack of rotation implies an easier structure from motion problem which can be leveraged for different kinds of tasks such as depth inference and obstacle avoidance. We also propose an architecture for end-to-end depth inference with a fully convolutional network. Results show that although tied to camera inner parameters, the problem is locally solvable and leads to good quality depth prediction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
