Focusable Monocular Depth Estimation
Yuxin Du, Tao Lin, Zile Zhong, Runting Li, Xiyao Chen, Jiting Liu, Chenglin Liu, Ying-Cong Chen, Yuqian Fu, and Bo Zhao

TL;DR
This paper introduces Focusable Monocular Depth Estimation (FDE), a task and framework that enables depth models to prioritize user-specified regions, improving accuracy at boundaries and foregrounds while maintaining scene coherence.
Contribution
The paper proposes FocusDepth, a prompt-conditioned depth estimation framework with Multi-Scale Spatial-Aligned Fusion, and establishes FDE-Bench, a new benchmark for target-centric depth estimation.
Findings
FocusDepth outperforms baseline models on FDE-Bench in target regions.
MSSA's spatial alignment is crucial for prompt-guided depth accuracy.
FocusDepth achieves significant improvements in boundary and foreground regions.
Abstract
Monocular depth foundation models generalize well across scenes, yet they are typically optimized with uniform pixel-wise objectives that do not distinguish user-specified or task-relevant target regions from the surrounding context. We therefore introduce Focusable Monocular Depth Estimation (FDE), a region-aware depth estimation task in which, given a specified target region, the model is required to prioritize foreground depth accuracy, preserve sharp boundary transitions, and maintain coherent global scene geometry. To prioritize task-critical region modeling, we propose FocusDepth, a prompt-conditioned monocular relative depth estimation framework that guides depth modeling to focus on target regions via box/text prompts. The core Multi-Scale Spatial-Aligned Fusion (MSSA) in FocusDepth spatially aligns multi-scale features from Segment Anything Model 3 to the Depth Anything family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
