Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing

Renyu Yang; Jian Jin; Lili Meng; Meiqin Liu; Yilin Wang; Balu Adsumilli; Weisi Lin

arXiv:2602.22659·cs.CV·February 27, 2026

Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing

Renyu Yang, Jian Jin, Lili Meng, Meiqin Liu, Yilin Wang, Balu Adsumilli, Weisi Lin

PDF

Open Access

TL;DR

This paper introduces a scalable, crowdsourced approach to creating a large, diverse audio-visual quality assessment dataset, enabling better model development and multimodal perception research.

Contribution

It presents a practical crowdsourcing framework and data preparation strategy to build the largest AVQA dataset with extensive annotations.

Findings

01

Validated the crowdsourcing approach for reliable AVQA annotations

02

Created YT-NTU-AVQ, the largest diverse AVQA dataset to date

03

Enabled new research avenues in multimodal perception mechanisms

Abstract

Audio-visual quality assessment (AVQA) research has been stalled by limitations of existing datasets: they are typically small in scale, with insufficient diversity in content and quality, and annotated only with overall scores. These shortcomings provide limited support for model development and multimodal perception research. We propose a practical approach for AVQA dataset construction. First, we design a crowdsourced subjective experiment framework for AVQA, breaks the constraints of in-lab settings and achieves reliable annotation across varied environments. Second, a systematic data preparation strategy is further employed to ensure broad coverage of both quality levels and semantic scenarios. Third, we extend the dataset with additional annotations, enabling research on multimodal perception mechanisms and their relation to content. Finally, we validate this approach through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Multisensory perception and integration