Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation

Shuyang Liu; Yuan Jin; Rui Lin; Shizhe Chen; Junyu Dai; Tao Jiang

arXiv:2511.18869·cs.SD·January 1, 2026

Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation

Shuyang Liu, Yuan Jin, Rui Lin, Shizhe Chen, Junyu Dai, Tao Jiang

PDF

Open Access 1 Models

TL;DR

HEAR is a comprehensive framework for music aesthetic evaluation that leverages multi-scale features, hierarchical augmentation, and hybrid loss functions to improve accuracy and robustness in assessing song quality.

Contribution

The paper introduces HEAR, a novel approach combining multi-source features, hierarchical augmentation, and hybrid training objectives for enhanced music aesthetic evaluation.

Findings

01

Outperforms baseline on ICASSP 2026 SongEval benchmark

02

Effectively mitigates overfitting with hierarchical augmentation

03

Achieves accurate scoring and top-tier song identification

Abstract

Evaluating song aesthetics is challenging due to the multidimensional nature of musical perception and the scarcity of labeled data. We propose HEAR, a robust music aesthetic evaluation framework that combines: (1) a multi-source multi-scale representations module to obtain complementary segment- and track-level features, (2) a hierarchical augmentation strategy to mitigate overfitting, and (3) a hybrid training objective that integrates regression and ranking losses for accurate scoring and reliable top-tier song identification. Experiments demonstrate that HEAR consistently outperforms the baseline across all metrics on both tracks of the ICASSP 2026 SongEval benchmark. The code and trained model weights are available at https://github.com/Eps-Acoustic-Revolution-Lab/EAR_HEAR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
earlab/EAR_HEAR
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception