Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video
Fei Zhao, Da Pan, Zelu Qi, Ping Shi

TL;DR
This paper introduces a new dataset and baseline model for audio-visual quality assessment of user-generated omnidirectional videos, addressing a gap in the study of AVQA for UGC ODVs in the context of the Metaverse.
Contribution
It constructs a comprehensive UGC ODV dataset with subjective quality scores and develops an effective AVQA baseline model for this emerging content type.
Findings
The dataset includes 300 videos across 10 scene types.
The baseline model achieves optimal performance on the dataset.
Subjective MOS scores were obtained for quality assessment.
Abstract
In response to the rising prominence of the Metaverse, omnidirectional videos (ODVs) have garnered notable interest, gradually shifting from professional-generated content (PGC) to user-generated content (UGC). However, the study of audio-visual quality assessment (AVQA) within ODVs remains limited. To address this, we construct a dataset of UGC omnidirectional audio and video (A/V) content. The videos are captured by five individuals using two different types of omnidirectional cameras, shooting 300 videos covering 10 different scene types. A subjective AVQA experiment is conducted on the dataset to obtain the Mean Opinion Scores (MOSs) of the A/V sequences. After that, to facilitate the development of UGC-ODV AVQA fields, we construct an effective AVQA baseline model on the proposed dataset, of which the baseline model consists of video feature extraction module, audio feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Video Analysis and Summarization · Music and Audio Processing
