PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial   Network

Chengqi Deng; Chengzhu Yu; Heng Lu; Chao Weng; Dong Yu

arXiv:1912.01852·cs.SD·February 19, 2020·1 cites

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

PDF

Open Access

TL;DR

PitchNet introduces an adversarial training approach to unsupervised singing voice conversion, significantly improving pitch accuracy and allowing flexible pitch control in generated singing voices.

Contribution

It advances unsupervised singing voice conversion by incorporating an adversarial pitch regression network for better pitch modeling and control.

Findings

01

Improved MOS score from 2.92 to 3.75.

02

Enhanced pitch accuracy and controllability.

03

Better quality of converted singing voices.

Abstract

Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely. In this paper, we propose to advance the existing unsupervised singing voice conversion method proposed in [1] to achieve more accurate pitch translation and flexible pitch manipulation. Specifically, the proposed PitchNet added an adversarially trained pitch regression network to enforce the encoder network to learn pitch invariant phoneme representation, and a separate module to feed pitch extracted from the source audio to the decoder network. Our evaluation shows that the proposed method can greatly improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing