TL;DR
This paper presents an open-source crowdsourcing toolkit for subjective evaluation of noise suppression algorithms, demonstrating high validity and reproducibility compared to laboratory methods, and practical application in large-scale challenges.
Contribution
The authors developed and validated an automated crowdsourcing toolkit for subjective speech quality evaluation, aligning with ITU standards and suitable for large-scale assessments.
Findings
High correlation (PCC=0.961) with laboratory MOS scores.
Reproducibility confirmed with PCC=0.99 in round-robin tests.
Practical application demonstrated in the INTERSPEECH 2021 challenge.
Abstract
The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec. P.835, in which participants rate background noise, speech signal, and overall quality separately. This paper introduces an open-source toolkit for conducting subjective quality evaluation of noise suppressed speech in crowdsourcing. We followed the ITU-T Rec. P.835, and P.808 and highly automate the process to prevent moderator's error. To assess the validity of our evaluation method, we compared the Mean Opinion Scores (MOS), calculate using ratings collected with our implementation, and the MOS values from a standard laboratory experiment conducted according to the ITU-T Rec P.835. Results show a high validity in all three scales namely background noise, speech signal and overall quality (average PCC = 0.961).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
