MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
Karl El Hajal, Milos Cernak, Pablo Mainar

TL;DR
MOSRA is a novel non-intrusive speech quality metric that jointly predicts room acoustics parameters and overall speech quality, enhancing accuracy, generalization, and explainability in diverse acoustic environments.
Contribution
This paper introduces MOSRA, a joint model for estimating room acoustics and speech quality, improving over prior methods by leveraging multi-dimensional training and explicit acoustics prediction.
Findings
Joint training improves speech quality prediction accuracy.
Explicit acoustics prediction enhances model generalization.
The model offers better explainability of speech quality assessments.
Abstract
The acoustic environment can degrade speech quality during communication (e.g., video call, remote presentation, outside voice recording), and its impact is often unknown. Objective metrics for speech quality have proven challenging to develop given the multi-dimensionality of factors that affect speech quality and the difficulty of collecting labeled data. Hypothesizing the impact of acoustics on speech quality, this paper presents MOSRA: a non-intrusive multi-dimensional speech quality metric that can predict room acoustics parameters (SNR, STI, T60, DRR, and C50) alongside the overall mean opinion score (MOS) for speech quality. By explicitly optimizing the model to learn these room acoustics parameters, we can extract more informative features and improve the generalization for the MOS task when the training data is limited. Furthermore, we also show that this joint training method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
