Zero-Shot Depth from Defocus

Yiming Zuo; Hongyu Wen; Venkat Subramanian; Patrick Chen; Karhan Kayan; Mario Bijelic; Felix Heide; Jia Deng

arXiv:2603.26658·cs.CV·March 30, 2026

Zero-Shot Depth from Defocus

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

PDF

1 Repo 4 Models 3 Datasets

TL;DR

This paper introduces a zero-shot generalization approach for depth from defocus using a new benchmark, a novel Transformer-based network architecture called FOSSA, and a training pipeline leveraging existing RGBD datasets.

Contribution

It presents a new real-world DfD benchmark ZEDD, a Transformer-based network FOSSA with a focus distance embedding, and a training pipeline that utilizes large-scale RGBD datasets for synthetic focus stacks.

Findings

01

Significant error reduction of up to 55.7% on benchmarks.

02

FOSSA outperforms previous methods in zero-shot generalization.

03

The ZEDD benchmark contains 8.3 times more scenes than prior datasets.

Abstract

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-vl/FOSSA
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.