StyleWaveGAN: Style-based synthesis of drum sounds with extensive   controls using generative adversarial networks

Antoine Lavault; Axel Roebel; Matthieu Voiry

arXiv:2204.00907·cs.SD·August 29, 2022·1 cites

StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks

Antoine Lavault, Axel Roebel, Matthieu Voiry

PDF

Open Access

TL;DR

StyleWaveGAN is a style-based generative adversarial network that synthesizes high-quality drum sounds with extensive control, faster-than-real-time performance, and improved quality over existing methods, using conditioning on drum type and audio descriptors.

Contribution

Introducing StyleWaveGAN, a novel style-based drum sound generator that enables extensive control, faster synthesis, and improved quality, along with an alternative to progressive GAN training and dataset balancing insights.

Findings

01

Achieved real-time capable synthesis of drum sounds on GPU.

02

Demonstrated superior quality over WaveGAN and NeuroDrum using Frechet Audio Distance.

03

Showed effective control over drum type and audio descriptors during synthesis.

Abstract

In this paper we introduce StyleWaveGAN, a style-based drum sound generator that is a variation of StyleGAN, a state-of-the-art image generator. By conditioning StyleWaveGAN on both the type of drum and several audio descriptors, we are able to synthesize waveforms faster than real-time on a GPU directly in CD quality up to a duration of 1.5s while retaining a considerable amount of control over the generation. We also introduce an alternative to the progressive growing of GANs and experimented on the effect of dataset balancing for generative tasks. The experiments are carried out on an augmented subset of a publicly available dataset comprised of different drums and cymbals. We evaluate against two recent drum generators, WaveGAN and NeuroDrum, demonstrating significantly improved generation quality (measured with the Frechet Audio Distance) and interesting results with perceptual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing