SalFormer360: a transformer-based saliency estimation model for 360-degree videos
Mahmoud Z. A. Wahba, Francesco Barbato, Sara Baldoni, and Federica Battisti

TL;DR
SalFormer360 is a transformer-based model for 360-degree video saliency estimation that adapts a 2D segmentation architecture and incorporates user attention bias, outperforming existing methods on major benchmarks.
Contribution
It introduces a novel transformer-based architecture for 360-degree saliency estimation, combining SegFormer with a custom decoder and viewing bias for improved accuracy.
Findings
Achieves up to 18.6% higher Pearson correlation on VR-EyeTracking dataset.
Outperforms state-of-the-art methods on three major saliency benchmarks.
Demonstrates the effectiveness of transformer architecture in 360-degree saliency prediction.
Abstract
Saliency estimation has received growing attention in recent years due to its importance in a wide range of applications. In the context of 360-degree video, it has been particularly valuable for tasks such as viewport prediction and immersive content optimization. In this paper, we propose SalFormer360, a novel saliency estimation model for 360-degree videos built on a transformer-based architecture. Our approach is based on the combination of an existing encoder architecture, SegFormer, and a custom decoder. The SegFormer model was originally developed for 2D segmentation tasks, and it has been fine-tuned to adapt it to 360-degree content. To further enhance prediction accuracy in our model, we incorporated Viewing Center Bias to reflect user attention in 360-degree environments. Extensive experiments on the three largest benchmark datasets for saliency estimation demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Virtual Reality Applications and Impacts
