Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation

Jai Dhiman

arXiv:2601.19029·cs.SD·January 28, 2026

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation

Jai Dhiman

PDF

Open Access 1 Models

TL;DR

This paper demonstrates that pre-trained audio foundation models significantly outperform symbolic MIDI representations in automated piano performance evaluation, capturing nuanced expressive qualities more effectively.

Contribution

It introduces the use of pre-trained audio models like MuQ and MERT for performance evaluation, showing substantial improvements over traditional symbolic approaches.

Findings

01

Audio models achieve R^2 = 0.537, outperforming symbolic baseline R^2 = 0.347.

02

Audio outperforms symbolic on all 19 perceptual dimensions with high statistical significance.

03

Fusion of audio and symbolic data offers minimal benefit due to high correlation of errors.

Abstract

Automated piano performance evaluation traditionally relies on symbolic (MIDI) representations, which capture note-level information but miss the acoustic nuances that characterize expressive playing. I propose using pre-trained audio foundation models, specifically MuQ and MERT, to predict 19 perceptual dimensions of piano performance quality. Using synthesized audio from PercePiano MIDI files (rendered via Pianoteq), I compare audio and symbolic approaches under controlled conditions where both derive from identical source data. The best model, MuQ layers 9-12 with Pianoteq soundfont augmentation, achieves R^2 = 0.537 (95% CI: [0.465, 0.575]), representing a 55% improvement over the symbolic baseline (R^2 = 0.347). Statistical analysis confirms significance (p < 10^-25) with audio outperforming symbolic on all 19 dimensions. I validate the approach through cross-soundfont…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CrescendAI/MuQ-Pianoteq-Piano-Eval
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception