Learning Single-Image Depth from Videos using Quality Assessment Networks
Weifeng Chen, Shengyi Qian, Jia Deng

TL;DR
This paper introduces a method to generate high-quality training data for single-image depth estimation from internet videos using a Quality Assessment Network, leading to improved depth prediction in real-world scenarios.
Contribution
It presents a novel approach combining SfM and a Quality Assessment Network to automatically create a large-scale dataset for training depth estimation models.
Findings
YouTube3D dataset improves depth estimation accuracy
Quality Assessment Network effectively filters high-quality SfM reconstructions
State-of-the-art results achieved on in-the-wild depth estimation tasks
Abstract
Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
