DepthFocus: Controllable Depth Estimation for See-Through Scenes
Junhong Min, Jimin Kim, Minwook Kim, Cheol-Hui Min, Youngpil Jeon, Minyong Choi

TL;DR
DepthFocus introduces a controllable depth estimation model using a steerable transformer that actively adjusts focus to resolve layered ambiguities in transparent scenes, outperforming existing methods.
Contribution
We propose DepthFocus, a novel steerable transformer that dynamically modulates depth perception based on target focus, enabling active, intent-driven 3D perception in complex scenes.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively resolves depth ambiguities in transparent and reflective scenes.
Maintains high precision in opaque regions while focusing on target depths.
Abstract
Depth in the real world is rarely singular. Transmissive materials create layered ambiguities that confound conventional perception systems. Existing models remain passive; conventional approaches typically estimate static depth maps anchored to the nearest surface, and even recent multi-head extensions suffer from a representational bottleneck due to fixed feature representations. This stands in contrast to human vision, which actively shifts focus to perceive a desired depth. We introduce \textbf{DepthFocus}, a steerable Vision Transformer that redefines stereo depth estimation as condition-aware control. Instead of extracting fixed features, our model dynamically modulates its computation based on a physical reference depth, integrating dual conditional mechanisms to selectively perceive geometry aligned with the desired focus. Leveraging a newly curated large-scale synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Image Enhancement Techniques
