Towards a Perceived Audiovisual Quality Model for Immersive Content
Randy Frans Fela, Nick Zacharov, S{\o}ren Forchhammer

TL;DR
This study investigates audiovisual quality assessment for immersive 360 videos and ambisonic audio, proposing models that correlate objective metrics with subjective perceptions to enhance predictive accuracy.
Contribution
It introduces a new audiovisual quality model for immersive content, emphasizing the importance of synchronized high-quality databases and improved subjective testing methodologies.
Findings
Cross-Format SPSNR-NN correlates better with subjective video scores.
A power model achieves high correlation with audiovisual test data.
Enhanced assessor training improves discrimination of multichannel audio quality.
Abstract
This paper studies the quality of multimedia content focusing on 360 video and ambisonic spatial audio reproduced using a head-mounted display and a multichannel loudspeaker setup. Encoding parameters following basic video quality test conditions for 360 videos were selected and a low-bitrate codec was used for the audio encoder. Three subjective experiments were performed for the audio, video, and audiovisual respectively. Peak signal-to-noise ratio (PSNR) and its variants for 360 videos were computed to obtain objective quality metrics and subsequently correlated with the subjective video scores. This study shows that a Cross-Format SPSNR-NN has a slightly higher linear and monotonic correlation over all video sequences. Based on the audiovisual model, a power model shows a highest correlation between test data and predicted scores. We concluded that to enable the development of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
