Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in   Singing Voice Synthesis

Tae-Woo Kim; Min-Su Kang; Gyeong-Hoon Lee

arXiv:2206.11558·eess.AS·June 14, 2024·1 cites

Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis

Tae-Woo Kim, Min-Su Kang, Gyeong-Hoon Lee

PDF

Open Access

TL;DR

This paper introduces a multi-task learning approach combining parametric and neural vocoder features to improve disentanglement of timbre and pitch in singing voice synthesis, resulting in more natural and controllable singing voices.

Contribution

It proposes a novel multi-task learning model that uses both parametric and mel-spectrogram features, enhancing voice quality and feature disentanglement in singing synthesis.

Findings

01

Generated singing voices are more natural than single-task models.

02

The model effectively disentangles timbre and pitch components.

03

It outperforms conventional parametric vocoder-based models.

Abstract

Recently, deep learning-based generative models have been introduced to generate singing voices. One approach is to predict the parametric vocoder features consisting of explicit speech parameters. This approach has the advantage that the meaning of each feature is explicitly distinguished. Another approach is to predict mel-spectrograms for a neural vocoder. However, parametric vocoders have limitations of voice quality and the mel-spectrogram features are difficult to model because the timbre and pitch information are entangled. In this study, we propose a singing voice synthesis model with multi-task learning to use both approaches -- acoustic features for a parametric vocoder and mel-spectrograms for a neural vocoder. By using the parametric vocoder features as auxiliary features, the proposed model can efficiently disentangle and control the timbre and pitch components of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies