From Audio Encoders to Piano Judges: Benchmarking Performance   Understanding for Solo Piano

Huan Zhang; Jinhua Liang; Simon Dixon

arXiv:2407.04518·eess.AS·July 22, 2024·ISMIR

From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano

Huan Zhang, Jinhua Liang, Simon Dixon

PDF

Open Access

TL;DR

This paper benchmarks various audio encoding models on performance-level tasks in solo piano music, introducing a new dataset and demonstrating the effectiveness of domain-specific fine-tuning for nuanced musical understanding.

Contribution

It introduces the Pianism-Labelling Dataset (PLD) and evaluates pre-trained audio encoders on expertise ranking, difficulty estimation, and technique detection tasks.

Findings

01

Audio-MAE achieved the highest overall performance.

02

Best accuracy was 93.6% in expertise ranking.

03

Case study on Chopin data revealed challenges in top-tier performance assessment.

Abstract

Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or genre, we identify a knowledge gap in performance-level music understanding, and address three critical tasks: expertise ranking, difficulty estimation, and piano technique detection, introducing a comprehensive Pianism-Labelling Dataset (PLD) for this purpose. We leverage pre-trained audio encoders, specifically Jukebox, Audio-MAE, MERT, and DAC, demonstrating varied capabilities in tackling downstream tasks, to explore whether domain-specific fine-tuning enhances capability in capturing performance nuances. Our best approach achieved 93.6\% accuracy in expertise ranking, 33.7\% in difficulty estimation, and 46.7\% in technique detection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing

MethodsDense Connections · Dilated Convolution · Convolution · VQ-VAE · Position-Wise Feed-Forward Layer · Residual Connection · Layer Normalization · Jukebox · Dynamic Algorithm Configuration