Spherical Vision Transformer for 360-degree Video Saliency Prediction
Mert Cokelek, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

TL;DR
This paper introduces SalViT360, a novel spherical vision transformer model that effectively predicts saliency in 360-degree videos by leveraging tangent image representations and a spherical geometry-aware self-attention mechanism, addressing unique challenges in omnidirectional video understanding.
Contribution
It is the first to employ tangent images in omnidirectional saliency prediction and introduces a spherical geometry-aware self-attention mechanism and a consistency-based unsupervised regularization.
Findings
Outperforms state-of-the-art methods on three datasets.
Effectively handles spherical distortion and high resolution.
Reduces artefacts in dense-prediction models.
Abstract
The growing interest in omnidirectional videos (ODVs) that capture the full field-of-view (FOV) has gained 360-degree saliency prediction importance in computer vision. However, predicting where humans look in 360-degree scenes presents unique challenges, including spherical distortion, high resolution, and limited labelled data. We propose a novel vision-transformer-based model for omnidirectional videos named SalViT360 that leverages tangent image representations. We introduce a spherical geometry-aware spatiotemporal self-attention mechanism that is capable of effective omnidirectional video understanding. Furthermore, we present a consistency-based unsupervised regularization term for projection-based 360-degree dense-prediction models to reduce artefacts in the predictions that occur after inverse projection. Our approach is the first to employ tangent images for omnidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Video Surveillance and Tracking Methods
