Real-time Monocular Depth Estimation on Embedded Systems

Cheng Feng; Congxuan Zhang; Zhen Chen; Weiming Hu; Liyue Ge

arXiv:2308.10569·cs.CV·June 10, 2024

Real-time Monocular Depth Estimation on Embedded Systems

Cheng Feng, Congxuan Zhang, Zhen Chen, Weiming Hu, Liyue Ge

PDF

Open Access

TL;DR

This paper introduces two lightweight neural network architectures, RT-MonoDepth and RT-MonoDepth-S, designed for real-time monocular depth estimation on embedded systems, achieving high accuracy and fast inference speeds suitable for autonomous vehicles.

Contribution

The paper presents novel efficient architectures that significantly improve inference speed while maintaining accuracy for monocular depth estimation on embedded platforms.

Findings

01

RT-MonoDepth achieves 18.4 FPS on Jetson Nano.

02

RT-MonoDepth-S achieves 30.5 FPS on Jetson Nano.

03

Both models outperform existing methods in speed and comparable accuracy on KITTI dataset.

Abstract

Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles. Nonetheless, contemporary monocular depth estimation methods employing complex deep neural networks within Convolutional Neural Networks are inadequately expedient for real-time inference on embedded platforms. This paper endeavors to surmount this challenge by proposing two efficient and lightweight architectures, RT-MonoDepth and RT-MonoDepth-S, thereby mitigating computational complexity and latency. Our methodologies not only attain accuracy comparable to prior depth estimation methods but also yield faster inference speeds. Specifically, RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin, utilizing a single RGB image of resolution 640x192. The experimental results underscore the superior accuracy and faster inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings