Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer
Shahan Nercessian

TL;DR
This paper introduces a differentiable WORLD synthesizer for neural vocoding, enabling end-to-end audio style transfer with improved naturalness and disentangled pitch and timbre modeling, applicable to voice conversion and timbre transfer.
Contribution
It presents a parameter-free differentiable synthesizer integrated into end-to-end models, enhancing audio style transfer and stability through acoustic feature disentanglement and optional post-processing modules.
Findings
Achieved adequate synthesis quality without model parameters.
Improved naturalness using source excitation spectrum extraction.
Enhanced training stability with acoustic feature-based loss functions.
Abstract
In this paper, we propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks such as (singing) voice conversion and the DDSP timbre transfer task. Accordingly, our baseline differentiable synthesizer has no model parameters, yet it yields adequate synthesis quality. We can extend the baseline synthesizer by appending lightweight black-box postnets which apply further processing to the baseline output in order to improve fidelity. An alternative differentiable approach considers extraction of the source excitation spectrum directly, which can improve naturalness albeit for a narrower class of style transfer applications. The acoustic feature parameterization used by our approaches has the added benefit that it naturally disentangles pitch and timbral information so that they can be modeled separately. Moreover, as there exists a robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
MethodsDifferentiable Digital Signal Processing
