# How do neural networks see depth in single images?

**Authors:** Tom van Dijk, Guido C.H.E. de Croon

arXiv: 1905.07005 · 2019-05-20

## TL;DR

This paper analyzes how a neural network estimates depth from single images, revealing it relies on vertical position cues and partially corrects for camera tilt, which influences its depth predictions.

## Contribution

It provides the first detailed analysis of the visual cues used by a neural network for monocular depth estimation, highlighting reliance on vertical position and edge cues.

## Key findings

- Network ignores obstacle size, uses vertical position for depth
- Partial correction for camera pitch and roll affects depth estimates
- Vertical position enables estimation of unseen obstacles with strong ground contact edges

## Abstract

Deep neural networks have lead to a breakthrough in depth estimation from single images. Recent work often focuses on the accuracy of the depth map, where an evaluation on a publicly available test set such as the KITTI vision benchmark is often the main result of the article. While such an evaluation shows how well neural networks can estimate depth, it does not show how they do this. To the best of our knowledge, no work currently exists that analyzes what these networks have learned.   In this work we take the MonoDepth network by Godard et al. and investigate what visual cues it exploits for depth estimation. We find that the network ignores the apparent size of known obstacles in favor of their vertical position in the image. Using the vertical position requires the camera pose to be known; however we find that MonoDepth only partially corrects for changes in camera pitch and roll and that these influence the estimated depth towards obstacles. We further show that MonoDepth's use of the vertical image position allows it to estimate the distance towards arbitrary obstacles, even those not appearing in the training set, but that it requires a strong edge at the ground contact point of the object to do so. In future work we will investigate whether these observations also apply to other neural networks for monocular depth estimation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.07005/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1905.07005/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1905.07005/full.md

---
Source: https://tomesphere.com/paper/1905.07005