Modeling Singing F0 With Neural Network Driven Transition-Sustain Models

Kanru Hua

arXiv:1803.04030·eess.AS·March 13, 2018·1 cites

Modeling Singing F0 With Neural Network Driven Transition-Sustain Models

Kanru Hua

PDF

Open Access

TL;DR

This paper introduces a neural network approach for modeling singing voice F0 curves from musical scores, effectively capturing vibratos and note boundary details by using transition and sustain models combined for continuous F0 generation.

Contribution

It proposes a novel neural network framework that separately models note transitions and sustain vibratos, improving F0 contour accuracy over traditional statistical methods.

Findings

01

Subjective tests show high similarity to original singing performances.

02

Models effectively reproduce vibratos and note boundary details.

03

Approach outperforms traditional statistical parametric methods.

Abstract

This study focuses on generating fundamental frequency (F0) curves of singing voice from musical scores stored in a midi-like notation. Current statistical parametric approaches to singing F0 modeling meet difficulties in reproducing vibratos and the temporal details at note boundaries due to the oversmoothing tendency of statistical models. This paper presents a neural network based solution that models a pair of neighboring notes at a time (the transition model) and uses a separate network for generating vibratos (the sustain model). Predictions from the two models are combined by summation after proper enveloping to enforce continuity. In the training phase, mild misalignment between the scores and the target F0 is addressed by back-propagating the gradients to the networks' inputs. Subjective listening tests on the NITech singing database show that transition-sustain models are able…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing