Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
Jai Dhiman

TL;DR
This paper demonstrates that pre-trained audio foundation models significantly outperform symbolic MIDI representations in automated piano performance evaluation, capturing nuanced expressive qualities more effectively.
Contribution
It introduces the use of pre-trained audio models like MuQ and MERT for performance evaluation, showing substantial improvements over traditional symbolic approaches.
Findings
Audio models achieve R^2 = 0.537, outperforming symbolic baseline R^2 = 0.347.
Audio outperforms symbolic on all 19 perceptual dimensions with high statistical significance.
Fusion of audio and symbolic data offers minimal benefit due to high correlation of errors.
Abstract
Automated piano performance evaluation traditionally relies on symbolic (MIDI) representations, which capture note-level information but miss the acoustic nuances that characterize expressive playing. I propose using pre-trained audio foundation models, specifically MuQ and MERT, to predict 19 perceptual dimensions of piano performance quality. Using synthesized audio from PercePiano MIDI files (rendered via Pianoteq), I compare audio and symbolic approaches under controlled conditions where both derive from identical source data. The best model, MuQ layers 9-12 with Pianoteq soundfont augmentation, achieves R^2 = 0.537 (95% CI: [0.465, 0.575]), representing a 55% improvement over the symbolic baseline (R^2 = 0.347). Statistical analysis confirms significance (p < 10^-25) with audio outperforming symbolic on all 19 dimensions. I validate the approach through cross-soundfont…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
