Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
Zhanhong He, Hanyu Meng, David Huang, Roberto Togneri

TL;DR
This paper introduces a multi-task, multi-scale neural network that efficiently estimates piano dynamics and metrical structure from audio, achieving state-of-the-art results with a compact model suitable for large-scale music analysis.
Contribution
It presents a novel multi-task, multi-scale network using Bark-scale loudness as input, significantly reducing model size and improving performance in estimating piano dynamics and metrical features.
Findings
Achieves state-of-the-art results on MazurkaBL dataset
Reduces model size from 14.7M to 0.5M parameters
Enables long-sequence processing for detailed musical analysis
Abstract
Estimating piano dynamic from audio recordings is a fundamental challenge in computational music analysis. In this paper, we propose an efficient multi-task network that jointly predicts dynamic levels, change points, beats, and downbeats from a shared latent representation. These four targets form the metrical structure of dynamics in the music score. Inspired by recent vocal dynamic research, we use a multi-scale network as the backbone, which takes Bark-scale specific loudness as the input feature. Compared to log-Mel as input, this reduces model size from 14.7 M to 0.5 M, enabling long sequential input. We use a 60-second audio length in audio segmentation, which doubled the length of beat tracking commonly used. Evaluated on the public MazurkaBL dataset, our model achieves state-of-the-art results across all tasks. This work sets a new benchmark for piano dynamic estimation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
