Msanii: High Fidelity Music Synthesis on a Shoestring Budget

Kinyugo Maina

arXiv:2301.06468·cs.SD·January 18, 2023·1 cites

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

Kinyugo Maina

PDF

Open Access 1 Repo

TL;DR

Msanii is a novel diffusion-based model that synthesizes long, high-fidelity stereo music efficiently at high sample rates, demonstrating the first successful application of diffusion models for such long music samples.

Contribution

Introduces Msanii, a diffusion-based music synthesis model capable of generating long, high-quality stereo music at high sample rates, a first in the field.

Findings

01

Synthesizes 190 seconds of stereo music at 44.1 kHz

02

Does not rely on concatenative or cascading synthesis techniques

03

Achieves high-fidelity music synthesis with diffusion models

Abstract

In this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Our model combines the expressiveness of mel spectrograms, the generative capabilities of diffusion models, and the vocoding capabilities of neural vocoders. We demonstrate the effectiveness of Msanii by synthesizing tens of seconds (190 seconds) of stereo music at high sample rates (44.1 kHz) without the use of concatenative synthesis, cascading architectures, or compression techniques. To the best of our knowledge, this is the first work to successfully employ a diffusion-based model for synthesizing such long music samples at high sample rates. Our demo can be found https://kinyugo.github.io/msanii-demo and our code https://github.com/Kinyugo/msanii .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kinyugo/msanii
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Computer Graphics and Visualization Techniques

MethodsDiffusion