MMMOS: Multi-domain Multi-axis Audio Quality Assessment

Yi-Cheng Lin; Jia-Hung Chen; Hung-yi Lee

arXiv:2507.04094·eess.AS·January 13, 2026

MMMOS: Multi-domain Multi-axis Audio Quality Assessment

Yi-Cheng Lin, Jia-Hung Chen, Hung-yi Lee

PDF

Open Access

TL;DR

MMMOS is a novel multi-axis audio quality assessment system that accurately estimates diverse perceptual factors across various audio domains, outperforming existing models in generalization and precision.

Contribution

It introduces a multi-domain, multi-axis no-reference audio quality assessment framework using ensemble learning and multiple pretrained encoders, addressing limitations of single-score models.

Findings

01

20-30% reduction in mean squared error

02

4-5% increase in Kendall's tau

03

First place in six of eight Production Complexity metrics

Abstract

Accurate audio quality estimation is essential for developing and evaluating audio generation, retrieval, and enhancement systems. Existing non-intrusive assessment models predict a single Mean Opinion Score (MOS) for speech, merging diverse perceptual factors and failing to generalize beyond speech. We propose MMMOS, a no-reference, multi-domain audio quality assessment system that estimates four orthogonal axes: Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness across speech, music, and environmental sounds. MMMOS fuses frame-level embeddings from three pretrained encoders (WavLM, MuQ, and M2D) and evaluates three aggregation strategies with four loss functions. By ensembling the top eight models, MMMOS shows a 20-30% reduction in mean squared error and a 4-5% increase in Kendall's {\tau} versus baseline, gains first place in six of eight Production…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Loss and Rehabilitation · Industrial Vision Systems and Defect Detection · Ultrasonics and Acoustic Wave Propagation