Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network

Zhanhong He; Hanyu Meng; David Huang; Roberto Togneri

arXiv:2510.18190·eess.AS·February 4, 2026

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network

Zhanhong He, Hanyu Meng, David Huang, Roberto Togneri

PDF

Open Access

TL;DR

This paper introduces a multi-task, multi-scale neural network that efficiently estimates piano dynamics and metrical structure from audio, achieving state-of-the-art results with a compact model suitable for large-scale music analysis.

Contribution

It presents a novel multi-task, multi-scale network using Bark-scale loudness as input, significantly reducing model size and improving performance in estimating piano dynamics and metrical features.

Findings

01

Achieves state-of-the-art results on MazurkaBL dataset

02

Reduces model size from 14.7M to 0.5M parameters

03

Enables long-sequence processing for detailed musical analysis

Abstract

Estimating piano dynamic from audio recordings is a fundamental challenge in computational music analysis. In this paper, we propose an efficient multi-task network that jointly predicts dynamic levels, change points, beats, and downbeats from a shared latent representation. These four targets form the metrical structure of dynamics in the music score. Inspired by recent vocal dynamic research, we use a multi-scale network as the backbone, which takes Bark-scale specific loudness as the input feature. Compared to log-Mel as input, this reduces model size from 14.7 M to 0.5 M, enabling long sequential input. We use a 60-second audio length in audio segmentation, which doubled the length of beat tracking commonly used. Evaluated on the public MazurkaBL dataset, our model achieves state-of-the-art results across all tasks. This work sets a new benchmark for piano dynamic estimation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing