Zero-Shot Depth from Defocus
Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

TL;DR
This paper introduces a zero-shot generalization approach for depth from defocus using a new benchmark, a novel Transformer-based network architecture called FOSSA, and a training pipeline leveraging existing RGBD datasets.
Contribution
It presents a new real-world DfD benchmark ZEDD, a Transformer-based network FOSSA with a focus distance embedding, and a training pipeline that utilizes large-scale RGBD datasets for synthetic focus stacks.
Findings
Significant error reduction of up to 55.7% on benchmarks.
FOSSA outperforms previous methods in zero-shot generalization.
The ZEDD benchmark contains 8.3 times more scenes than prior datasets.
Abstract
Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
