Multivariate Probabilistic Assessment of Speech Quality
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Sch\"uldt, Saikat Chatterjee

TL;DR
This paper introduces a multivariate probabilistic model for speech quality assessment that jointly estimates multiple dimensions of speech quality, providing uncertainty and correlation insights beyond traditional single-score metrics.
Contribution
It extends univariate MOS estimation to a multivariate framework using Gaussian modeling, enabling joint assessment of multiple speech quality dimensions with uncertainty quantification.
Findings
Model performs comparably to state-of-the-art in point estimation.
Provides uncertainty estimates for each dimension.
Captures correlations between different speech quality aspects.
Abstract
The mean opinion score (MOS) is a standard metric for assessing speech quality, but its singular focus fails to identify specific distortions when low scores are observed. The NISQA dataset addresses this limitation by providing ratings across four additional dimensions: noisiness, coloration, discontinuity, and loudness, alongside MOS. In this paper, we extend the explored univariate MOS estimation to a multivariate framework by modeling these dimensions jointly using a multivariate Gaussian distribution. Our approach utilizes Cholesky decomposition to predict covariances without imposing restrictive assumptions and extends probabilistic affine transformations to a multivariate context. Experimental results show that our model performs on par with state-of-the-art methods in point estimation, while uniquely providing uncertainty and correlation estimates across speech quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Image and Video Quality Assessment · Emotion and Mood Recognition
MethodsFocus
