Towards Improving Harmonic Sensitivity and Prediction Stability for   Singing Melody Extraction

Keren Shao; Ke Chen; Taylor Berg-Kirkpatrick; Shlomo Dubnov

arXiv:2308.02723·cs.SD·August 8, 2023

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

PDF

Open Access 1 Repo

TL;DR

This paper introduces input feature and training objective modifications to improve harmonic sensitivity and prediction stability in singing melody extraction models, demonstrating empirical effectiveness across multiple architectures.

Contribution

It proposes novel modifications based on harmonic decay and segment stability assumptions, enhancing existing models and introducing a new model, PianoNet.

Findings

01

Enhanced harmonic sensitivity in models

02

Improved stability of melody contour predictions

03

Effective across multiple neural network architectures

Abstract

In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smoothken/kknet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies