Cross360: 360{\deg} Monocular Depth Estimation via Cross Projections Across Scales
Kun Huang, Fang-Lue Zhang, Neil Dodgson

TL;DR
Cross360 introduces a novel cross-attention architecture that effectively combines local tangent patches and global equirectangular features for accurate 360-degree monocular depth estimation, outperforming existing methods.
Contribution
The paper proposes a new cross-attention-based architecture with modules for feature alignment and progressive aggregation, improving global consistency and accuracy in 360-degree depth estimation.
Findings
Outperforms existing methods on benchmark datasets
Achieves higher accuracy in global depth consistency
Effective integration of local and global features
Abstract
360{\deg} depth estimation is a challenging research problem due to the difficulty of finding a representation that both preserves global continuity and avoids distortion in spherical images. Existing methods attempt to leverage complementary information from multiple projections, but struggle with balancing global and local consistency. Their local patch features have limited global perception, and the combined global representation does not address discrepancies in feature extraction at the boundaries between patches. To address these issues, we propose Cross360, a novel cross-attention-based architecture integrating local and global information using less-distorted tangent patches along with equirectangular features. Our Cross Projection Feature Alignment module employs cross-attention to align local tangent projection features with the equirectangular projection's 360{\deg} field of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Robotics and Sensor-Based Localization
