Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
Wenzhang Du

TL;DR
This paper introduces a contract-driven framework for assessing speech and singing quality that improves stability and interpretability over traditional MOS scores, using service graphs and human-interpretable contracts.
Contribution
It proposes a novel contract-driven QoE auditing framework that generalizes MOS regression, enhances stability, and leverages semantic contracts for better quality assessment.
Findings
Contract-driven quality is more stable than MOS under graph transformations.
The framework achieves MOS prediction accuracy comparable to state-of-the-art models.
Contract semantics influence the sample complexity of learning quality assessments.
Abstract
Subjective mean opinion scores (MOS) remain the de-facto target for non-intrusive speech and singing quality assessment. However, MOS is a scalar that collapses heterogeneous user expectations, ignores service-level objectives, and is difficult to compare across deployment graphs. We propose a contract-driven QoE auditing framework: each service graph G is evaluated under a set of human-interpretable experience contracts C, yielding a contract-level satisfaction vector Q(G, C). We show that (i) classical MOS regression is a special case with a degenerate contract set, (ii) contract-driven quality is more stable than MOS under graph view transformations (e.g., pooling by system vs. by system type), and (iii) the effective sample complexity of learning contracts is governed by contract semantics rather than merely the dimensionality of C. We instantiate the framework on URGENT2024 MOS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis
