# Learning Single Camera Depth Estimation using Dual-Pixels

**Authors:** Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan T. Barron

arXiv: 1904.05822 · 2019-08-15

## TL;DR

This paper introduces a novel method for depth estimation from single-camera dual-pixel data, leveraging hardware and optics understanding to improve accuracy and reduce model complexity, demonstrated on a new dataset.

## Contribution

It identifies an ambiguity in dual-pixel depth cues and develops a learning approach that effectively estimates depth up to this ambiguity, enabling better monocular depth estimation.

## Key findings

- Model achieves 30% higher accuracy than prior methods.
- Effective use of dual-pixel hardware for depth estimation.
- Small models can produce high-quality depth maps.

## Abstract

Deep learning techniques have enabled rapid progress in monocular depth estimation, but their quality is limited by the ill-posed nature of the problem and the scarcity of high quality datasets. We estimate depth from a single camera by leveraging the dual-pixel auto-focus hardware that is increasingly common on modern camera sensors. Classic stereo algorithms and prior learning-based depth estimation techniques under-perform when applied on this dual-pixel data, the former due to too-strong assumptions about RGB image matching, and the latter due to not leveraging the understanding of optics of dual-pixel image formation. To allow learning based methods to work well on dual-pixel imagery, we identify an inherent ambiguity in the depth estimated from dual-pixel cues, and develop an approach to estimate depth up to this ambiguity. Using our approach, existing monocular depth estimation techniques can be effectively applied to dual-pixel data, and much smaller models can be constructed that still infer high quality depth. To demonstrate this, we capture a large dataset of in-the-wild 5-viewpoint RGB images paired with corresponding dual-pixel data, and show how view supervision with this data can be used to learn depth up to the unknown ambiguities. On our new task, our model is 30% more accurate than any prior work on learning-based monocular or stereoscopic depth estimation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.05822/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/1904.05822/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/1904.05822/full.md

---
Source: https://tomesphere.com/paper/1904.05822