TL;DR
This paper evaluates the adaptation of the Comparison Category Rating (CCR) speech quality assessment method for crowdsourcing, demonstrating its reliability and validity compared to traditional laboratory tests.
Contribution
It introduces a crowdsourcing adaptation of the CCR speech quality assessment method, showing it is reproducible, reliable, and comparable to laboratory-based tests.
Findings
CCR results are highly reproducible in crowdsourcing.
Crowdsourcing CCR correlates well with laboratory tests.
The method is suitable for large-scale speech quality evaluation.
Abstract
Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and rate their perceived quality for each stimulus on a specific scale. The Comparison Category Rating (CCR) is another standard approach in which participants listen to both reference and processed stimuli and rate their quality compared to the other one. The CCR method is particularly suitable for systems that improve the quality of input speech. This paper evaluates an adaptation of the CCR test procedure for assessing speech quality in the crowdsourcing set-up. The CCR method was introduced in the ITU-T Rec. P.800 for laboratory-based experiments. We adapted the test for the crowdsourcing approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
