Progressive distillation diffusion for raw music generation
Svetlana Pavlova

TL;DR
This paper introduces a novel diffusion-based deep learning model for raw music generation, demonstrating its ability to generate and process audio in waveform and spectrogram domains with promising results.
Contribution
It applies progressive distillation diffusion with 1D U-Net to music generation, a novel approach in waveform domain, and compares various diffusion parameters for optimal results.
Findings
Model effectively generates raw audio and mel-spectrograms.
Diffusion parameters significantly impact generation quality.
Model handles multi-channel audio processing and looped generation.
Abstract
This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Focus · Diffusion · Max Pooling · U-Net
