Progressive distillation diffusion for raw music generation

Svetlana Pavlova

arXiv:2307.10994·cs.SD·July 21, 2023

Progressive distillation diffusion for raw music generation

Svetlana Pavlova

PDF

Open Access

TL;DR

This paper introduces a novel diffusion-based deep learning model for raw music generation, demonstrating its ability to generate and process audio in waveform and spectrogram domains with promising results.

Contribution

It applies progressive distillation diffusion with 1D U-Net to music generation, a novel approach in waveform domain, and compares various diffusion parameters for optimal results.

Findings

01

Model effectively generates raw audio and mel-spectrograms.

02

Diffusion parameters significantly impact generation quality.

03

Model handles multi-channel audio processing and looped generation.

Abstract

This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Focus · Diffusion · Max Pooling · U-Net