Differentiable WORLD Synthesizer-based Neural Vocoder With Application   To End-To-End Audio Style Transfer

Shahan Nercessian

arXiv:2208.07282·eess.AS·May 9, 2023·6 cites

Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer

Shahan Nercessian

PDF

Open Access

TL;DR

This paper introduces a differentiable WORLD synthesizer for neural vocoding, enabling end-to-end audio style transfer with improved naturalness and disentangled pitch and timbre modeling, applicable to voice conversion and timbre transfer.

Contribution

It presents a parameter-free differentiable synthesizer integrated into end-to-end models, enhancing audio style transfer and stability through acoustic feature disentanglement and optional post-processing modules.

Findings

01

Achieved adequate synthesis quality without model parameters.

02

Improved naturalness using source excitation spectrum extraction.

03

Enhanced training stability with acoustic feature-based loss functions.

Abstract

In this paper, we propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks such as (singing) voice conversion and the DDSP timbre transfer task. Accordingly, our baseline differentiable synthesizer has no model parameters, yet it yields adequate synthesis quality. We can extend the baseline synthesizer by appending lightweight black-box postnets which apply further processing to the baseline output in order to improve fidelity. An alternative differentiable approach considers extraction of the source excitation spectrum directly, which can improve naturalness albeit for a narrower class of style transfer applications. The acoustic feature parameterization used by our approaches has the added benefit that it naturally disentangles pitch and timbral information so that they can be modeled separately. Moreover, as there exists a robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques

MethodsDifferentiable Digital Signal Processing