Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu, Yutong Wen, Zhiyao Duan

TL;DR
This paper reviews diffusion-based generative models for audio, emphasizing design principles, implementation guidance, and providing an open-source framework demonstrated through diverse audio applications.
Contribution
It offers a comprehensive review of diffusion model design principles for audio, introduces an open-source codebase, and evaluates applications like audio generation and speech synthesis.
Findings
Effective diffusion model configurations for audio quality
Open-source framework facilitates reproducibility and rapid prototyping
Benchmark results demonstrate model versatility across tasks
Abstract
Diffusion models have emerged as powerful deep generative techniques, producing high-quality and diverse samples in applications in various domains including audio. While existing reviews provide overviews, there remains limited in-depth discussion of these specific design choices. The audio diffusion model literature also lacks principled guidance for the implementation of these design choices and their comparisons for different applications. This survey provides a comprehensive review of diffusion model design with an emphasis on design principles for quality improvement and conditioning for audio applications. We adopt the score modeling perspective as a unifying framework that accommodates various interpretations, including recent approaches like flow matching. We systematically examine the training and sampling procedures of diffusion models, and audio applications through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Speech and Audio Processing · Music and Audio Processing
