${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface
Meng Li, Senbo Wang, Weihao Yuan, Weichao Shen, Zhe Sheng, Zilong, Dong

TL;DR
This paper introduces S2Net, a deep learning model that accurately estimates depth from panoramic images by projecting features onto a spherical surface and using a global cross-attention fusion, outperforming previous methods.
Contribution
The paper presents a novel end-to-end network that handles distortion in panoramic images and effectively captures global context for depth estimation.
Findings
Outperforms previous state-of-the-art methods on five datasets.
Uses spherical surface projection to reduce distortion effects.
Employs a global cross-attention fusion module for better feature integration.
Abstract
Monocular depth estimation is an ambiguous problem, thus global structural cues play an important role in current data-driven single-view depth estimation methods. Panorama images capture the complete spatial information of their surroundings utilizing the equirectangular projection which introduces large distortion. This requires the depth estimation method to be able to handle the distortion and extract global context information from the image. In this paper, we propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface. Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids, where the decoder network can aggregate the information from the distortion-reduced feature maps. Meanwhile, we propose a global cross-attention-based fusion module to fuse the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
