Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model
Jozef Coldenhoff, Andrew Harper, Paul Kendrick, Tijana Stojkovic,, Milos Cernak

TL;DR
This paper introduces a multi-channel model for joint prediction of room acoustics and speech quality metrics, leveraging simulated data and a teacher model, achieving improved accuracy with less computation.
Contribution
It presents the first multi-channel approach for MOS and room acoustics prediction, utilizing simulated data and a teacher-student setup for training.
Findings
Multi-channel model outperforms single-channel in predicting room acoustics.
Achieves roughly 5× less computation while maintaining performance.
Improves prediction of direct-to-reverberation ratio, clarity, and speech transmission index.
Abstract
Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device selection for rooms with multiple recording devices may benefit from a multi-channel approach where the descriptive metrics are predicted for multiple devices in parallel. Following our hypothesis that a model may benefit from multi-channel training, we develop a multi-channel model for joint MOS and room acoustics prediction (MOSRA) for five channels in parallel. The lack of multi-channel audio data with ground truth labels necessitated the creation of simulated data using an acoustic simulator with room acoustic labels extracted from the generated impulse responses and labels for MOS generated in a student-teacher setup using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation
