J-MOD$^{2}$: Joint Monocular Obstacle Detection and Depth Estimation
Michele Mancini, Gabriele Costante, Paolo Valigi, Thomas A., Ciarfuglia

TL;DR
This paper introduces J-MOD$^{2}$, an end-to-end deep learning architecture that jointly detects obstacles and estimates their depth for MAV navigation, improving robustness without relying on complex 3D mapping.
Contribution
The paper presents a novel joint obstacle detection and depth estimation network that enhances robustness and integrates seamlessly with MAV navigation systems.
Findings
Outperforms state-of-the-art multi-task methods in obstacle detection and depth estimation.
Effective in diverse scenarios with different appearances and focal lengths.
Enables safe MAV navigation through simulated experiments.
Abstract
In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
