Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling
Yishan Lv, Jing Luo, Boyuan Ju, Yang Zhang, Xinda Wu, Bo Yuan, Xinyu Yang

TL;DR
This paper introduces a novel framework for automated song aesthetics evaluation using multi-stem attention and hierarchical uncertainty modeling, effectively capturing musical nuances and human perception complexities.
Contribution
It proposes two innovative modules, MSAF and HiGIA, to better model musical features and score uncertainties, advancing beyond traditional MOS prediction methods.
Findings
Outperforms existing models on SongEval and internal datasets
Effectively captures complex musical features and human perception nuances
Achieves stronger performance in multi-dimensional song aesthetics evaluation
Abstract
Music generative artificial intelligence (AI) is rapidly expanding music content, necessitating automated song aesthetics evaluation. However, existing studies largely focus on speech, audio or singing quality, leaving song aesthetics underexplored. Moreover, conventional approaches often predict a precise Mean Opinion Score (MOS) value directly, which struggles to capture the nuances of human perception in song aesthetics evaluation. This paper proposes a song-oriented aesthetics evaluation framework, featuring two novel modules: 1) Multi-Stem Attention Fusion (MSAF) builds bidirectional cross-attention between mixture-vocal and mixture-accompaniment pairs, fusing them to capture complex musical features; 2) Hierarchical Granularity-Aware Interval Aggregation (HiGIA) learns multi-granularity score probability distributions, aggregates them into a score interval, and applies a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
