TL;DR
This paper introduces a novel depth estimation method that uses defocus cues as domain-invariant supervision, enabling models trained on synthetic data to generalize effectively to real-world images.
Contribution
The authors propose a permutation invariant CNN leveraging defocus maps as an intermediate supervisory signal to bridge the synthetic-real domain gap in depth estimation.
Findings
Achieves state-of-the-art depth prediction on real datasets
Demonstrates effective generalization from synthetic to real images
Uses defocus cues as domain-invariant supervision
Abstract
Data-driven depth estimation methods struggle with the generalization outside their training scenes due to the immense variability of the real-world scenes. This problem can be partially addressed by utilising synthetically generated images, but closing the synthetic-real domain gap is far from trivial. In this paper, we tackle this issue by using domain invariant defocus blur as direct supervision. We leverage defocus cues by using a permutation invariant convolutional neural network that encourages the network to learn from the differences between images with a different point of focus. Our proposed network uses the defocus map as an intermediate supervisory signal. We are able to train our model completely on synthetic data and directly apply it to a wide range of real-world images. We evaluate our model on synthetic and real datasets, showing compelling generalization results and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation· youtube
