Absolute distance prediction based on deep learning object detection and monocular depth estimation models
Armin Masoumian, David G. F. Marei, Saddam Abdulwahab, Julian, Cristiano, Domenec Puig, Hatem A. Rashwan

TL;DR
This paper introduces a deep learning framework combining object detection and depth estimation from a single image to accurately predict absolute object distances in outdoor scenes.
Contribution
It presents a novel combination of YOLOv5 for object detection and a self-supervised autoencoder for depth estimation to determine absolute distances from monocular images.
Findings
Achieved 96% accuracy in absolute distance prediction
RMSE of 0.203 in distance estimation
Framework effective on real outdoor images
Abstract
Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image. Firstly, objects in the scene are detected and localized using the You Only Look Once (YOLOv5) network. In parallel, the estimated depth image is computed using a deep autoencoder network to detect the relative distances. The proposed object detection based YOLO was trained using a supervised learning technique, in turn, the network of depth estimation was self-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsYou Only Look Once
