A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
Ben Hayes, Jordie Shier, Gy\"orgy Fazekas, Andrew McPherson,, Charalampos Saitis

TL;DR
This survey reviews differentiable digital signal processing techniques in music and speech synthesis, highlighting applications, implemented operations, challenges, and future research directions.
Contribution
It provides a comprehensive overview of differentiable DSP methods in audio synthesis, cataloging applications, operations, and discussing open challenges and future research avenues.
Findings
Differentiable DSP enables backpropagation through audio processing.
Applications include music rendering, sound matching, and voice transformation.
Open challenges involve optimization issues and robustness to real-world conditions.
Abstract
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
