# Metric scale non-fixed obstacles distance estimation using a 3D map and a monocular camera

**Authors:** Daijiro Higashi, Naoki Fukuta, Tsuyoshi Tasaki

PMC · DOI: 10.3389/frobt.2025.1560342 · 2025-06-12

## TL;DR

This paper improves distance estimation for non-fixed obstacles in autonomous driving using a new loss function called DifSeg with a monocular camera and 3D map.

## Contribution

A novel loss function, DifSeg, is introduced to enhance distance estimation accuracy for non-fixed obstacles in autonomous driving systems.

## Key findings

- DifSeg improved distance estimation accuracy across CARLA, KITTI, and an indoor dataset.
- On KITTI, the method reduced distance estimation error by 2.14 m compared to the latest monocular depth estimation method.
- The new approach focuses training on non-fixed obstacles, addressing a key limitation of previous methods.

## Abstract

Obstacle avoidance is important for autonomous driving. Metric scale obstacle detection using a monocular camera for obstacle avoidance has been studied. In this study, metric scale obstacle detection means detecting obstacles and measuring the distance to them with a metric scale. We have already developed PMOD-Net, which realizes metric scale obstacle detection by using a monocular camera and a 3D map for autonomous driving. However, PMOD-Net’s distance error of non-fixed obstacles that do not exist on the 3D map is large. Accordingly, this study deals with the problem of improving distance estimation of non-fixed obstacles for obstacle avoidance. To solve the problem, we focused on the fact that PMOD-Net simultaneously performed object detection and distance estimation. We have developed a new loss function called “DifSeg.” DifSeg is calculated from the distance estimation results on the non-fixed obstacle region, which is defined based on the object detection results. Therefore, DifSeg makes PMOD-Net focus on non-fixed obstacles during training. We evaluated the effect of DifSeg by using CARLA simulator, KITTI, and an original indoor dataset. The evaluation results showed that the distance estimation accuracy was improved on all datasets. Especially in the case of KITTI, the distance estimation error of our method was 2.42 m, which was 2.14 m less than that of the latest monocular depth estimation method.

## Full-text entities

- **Chemicals:** DifSeg (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12198967/full.md

---
Source: https://tomesphere.com/paper/PMC12198967