Listen to the Unexpected: Self-Supervised Surprise Detection for Efficient Viewport Prediction

Arman Nik Khah; Ravi Prakash

arXiv:2601.02629·cs.MM·January 7, 2026

Listen to the Unexpected: Self-Supervised Surprise Detection for Efficient Viewport Prediction

Arman Nik Khah, Ravi Prakash

PDF

Open Access

TL;DR

This paper introduces a self-supervised surprise detection framework using spatial audio cues to improve viewport prediction in 360-degree video streaming, reducing bandwidth waste.

Contribution

It presents a novel self-learning approach combining graph neural networks and temporal modeling to detect auditory surprises for enhanced viewport prediction.

Findings

01

Reduces bitrate waste by up to 18% with audio surprise integration.

02

Demonstrates effectiveness of auditory cues in viewport prediction.

03

Validates approach on AVTrack360 dataset.

Abstract

Adaptive streaming of 360-degree video relies on viewport prediction to allocate bandwidth efficiently. Current approaches predominantly use visual saliency or historical gaze patterns, neglecting the role of spatial audio in guiding user attention. This paper presents a self-learning framework for detecting "surprising" auditory events -- moments that deviate from learned temporal expectations -- and demonstrates their utility for viewport prediction. The proposed architecture combines $S E (3)$ -equivariant graph neural networks with recurrent temporal modeling, trained via a dual self-supervised objective. A key feature is the natural modeling of temporal attention decay: surprise is high at event onset but diminishes as the listener adapts. Experiments on the AVTrack360 dataset show that integrating audio surprise with visual cues reduces bitrate waste by up to 18% compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Video Analysis and Summarization