DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech
Suhita Ghosh, Yamini Sinha, Sebastian Stober

TL;DR
This paper enhances DDSP-QbE speech synthesis by introducing voicing detection and PolyBLEP correction, reducing artefacts and improving naturalness in speech anonymisation, especially for atypical speech.
Contribution
It proposes two novel, lightweight modifications to the DDSP-QbE synthesizer that significantly improve speech quality without adding learnable parameters.
Findings
Reduced aliasing artefacts and spectral distortion.
Improved perceptual naturalness measured by MOS.
Seamless integration into existing training pipeline.
Abstract
Differentiable Digital Signal Processing (DDSP) pipelines for voice conversion rely on subtractive synthesis, where a periodic excitation signal is shaped by a learned spectral envelope to reconstruct the target voice. In DDSP-QbE, the excitation is generated via phase accumulation, producing a sawtooth-like waveform whose abrupt discontinuities introduce aliasing artefacts that manifest perceptually as buzziness and spectral distortion, particularly at higher fundamental frequencies. We propose two targeted improvements to the excitation stage of the DDSP-QbE subtractive synthesizer. First, we incorporate explicit voicing detection to gate the harmonic excitation, suppressing the periodic component in unvoiced regions and replacing it with filtered noise, thereby avoiding aliased harmonic content where it is most perceptually disruptive. Second, we apply Polynomial Band-Limited Step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
