TL;DR
This paper introduces an open-source off-road navigation system that effectively combines LiDAR and monocular depth estimation, demonstrating comparable performance in unstructured terrains and providing a reproducible benchmark.
Contribution
It presents a novel integration of foundation-model-based monocular depth estimation with SLAM for off-road navigation, without requiring task-specific training.
Findings
Monocular depth estimation matches LiDAR performance in many scenarios.
Edge-masking and temporal smoothing improve robustness against obstacles.
The open-source stack and simulation environment facilitate reproducible research.
Abstract
Off-road autonomous navigation demands reliable 3D perception for robust obstacle detection in challenging unstructured terrain. While LiDAR is accurate, it is costly and power-intensive. Monocular depth estimation using foundation models offers a lightweight alternative, but its integration into outdoor navigation stacks remains underexplored. We present an open-source off-road navigation stack supporting both LiDAR and monocular 3D perception without task-specific training. For the monocular setup, we combine zero-shot depth prediction (Depth Anything V2) with metric depth rescaling using sparse SLAM measurements (VINS-Mono). Two key enhancements improve robustness: edge-masking to reduce obstacle hallucination and temporal smoothing to mitigate the impact of SLAM instability. The resulting point cloud is used to generate a robot-centric 2.5D elevation map for costmap-based planning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
