Real-time Monocular Depth Estimation on Embedded Systems
Cheng Feng, Congxuan Zhang, Zhen Chen, Weiming Hu, Liyue Ge

TL;DR
This paper introduces two lightweight neural network architectures, RT-MonoDepth and RT-MonoDepth-S, designed for real-time monocular depth estimation on embedded systems, achieving high accuracy and fast inference speeds suitable for autonomous vehicles.
Contribution
The paper presents novel efficient architectures that significantly improve inference speed while maintaining accuracy for monocular depth estimation on embedded platforms.
Findings
RT-MonoDepth achieves 18.4 FPS on Jetson Nano.
RT-MonoDepth-S achieves 30.5 FPS on Jetson Nano.
Both models outperform existing methods in speed and comparable accuracy on KITTI dataset.
Abstract
Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles. Nonetheless, contemporary monocular depth estimation methods employing complex deep neural networks within Convolutional Neural Networks are inadequately expedient for real-time inference on embedded platforms. This paper endeavors to surmount this challenge by proposing two efficient and lightweight architectures, RT-MonoDepth and RT-MonoDepth-S, thereby mitigating computational complexity and latency. Our methodologies not only attain accuracy comparable to prior depth estimation methods but also yield faster inference speeds. Specifically, RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin, utilizing a single RGB image of resolution 640x192. The experimental results underscore the superior accuracy and faster inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
