Spherical Vision Transformer for 360-degree Video Saliency Prediction

Mert Cokelek; Nevrez Imamoglu; Cagri Ozcinar; Erkut Erdem; Aykut Erdem

arXiv:2308.13004·cs.CV·August 28, 2023

Spherical Vision Transformer for 360-degree Video Saliency Prediction

Mert Cokelek, Nevrez Imamoglu, Cagri Ozcinar, Erkut Erdem, Aykut Erdem

PDF

Open Access 1 Repo

TL;DR

This paper introduces SalViT360, a novel spherical vision transformer model that effectively predicts saliency in 360-degree videos by leveraging tangent image representations and a spherical geometry-aware self-attention mechanism, addressing unique challenges in omnidirectional video understanding.

Contribution

It is the first to employ tangent images in omnidirectional saliency prediction and introduces a spherical geometry-aware self-attention mechanism and a consistency-based unsupervised regularization.

Findings

01

Outperforms state-of-the-art methods on three datasets.

02

Effectively handles spherical distortion and high resolution.

03

Reduces artefacts in dense-prediction models.

Abstract

The growing interest in omnidirectional videos (ODVs) that capture the full field-of-view (FOV) has gained 360-degree saliency prediction importance in computer vision. However, predicting where humans look in 360-degree scenes presents unique challenges, including spherical distortion, high resolution, and limited labelled data. We propose a novel vision-transformer-based model for omnidirectional videos named SalViT360 that leverages tangent image representations. We introduce a spherical geometry-aware spatiotemporal self-attention mechanism that is capable of effective omnidirectional video understanding. Furthermore, we present a consistency-based unsupervised regularization term for projection-based 360-degree dense-prediction models to reduce artefacts in the predictions that occur after inverse projection. Our approach is the first to employ tangent images for omnidirectional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MertCokelek/SalViT360
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Video Surveillance and Tracking Methods