MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
Di Zhu, Zixuan Li

TL;DR
MuQ-Eval is an open-source, lightweight, per-sample quality metric for AI music generation that correlates strongly with human judgments and can be personalized with minimal data.
Contribution
Introduces MuQ-Eval, a novel open-source per-sample music quality metric trained on frozen features, outperforming existing metrics and enabling personalized evaluation.
Findings
Achieves high correlation with human scores at system and utterance levels.
Frozen MuQ features capture quality-relevant information effectively.
Models trained on few clips can produce usable personalized evaluations.
Abstract
Distributional metrics such as Fr\'echet Audio Distance cannot score individual music clips and correlate poorly with human judgments, while the only per-sample learned metric achieving high human correlation is closed-source. We introduce MUQ-EVAL, an open-source per-sample quality metric for AIgenerated music built by training lightweight prediction heads on frozen MuQ-310M features using MusicEval, a dataset of generated clips from 31 text-to-music systems with expert quality ratings. Our simplest model, frozen features with attention pooling and a two-layer MLP, achieves system-level SRCC = 0.957 and utterance-level SRCC = 0.838 with human mean opinion scores. A systematic ablation over training objectives and adaptation strategies shows that no addition meaningfully improves the frozen baseline, indicating that frozen MuQ representations already capture quality-relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Speech and Audio Processing
