Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey
Uchitha Rajapaksha, Ferdous Sohel, Hamid Laga, Dean Diepeveen,, Mohammed Bennamoun

TL;DR
This survey comprehensively reviews deep learning methods for monocular depth estimation from images and videos, highlighting their evolution, challenges, architectures, and evaluation metrics over the past decade.
Contribution
It provides a detailed taxonomy, discusses milestones, and analyzes datasets and evaluation metrics, offering a complete overview of the field.
Findings
Extensive classification of existing methods based on input, output, architecture, and supervision.
Identification of key milestones and trends in monocular depth estimation.
Analysis of datasets and evaluation metrics used in the field.
Abstract
Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep learning-based methods, the challenges they address, and how they have evolved in their architecture and supervision methods. It provides a taxonomy for classifying the current work based on their input and output modalities, network architectures, and learning methods. It also discusses the major milestones in the history of monocular depth estimation, and different pipelines, datasets, and evaluation metrics used in existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
