Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Jesse Engel; Cinjon Resnick; Adam Roberts; Sander Dieleman; Douglas; Eck; Karen Simonyan; Mohammad Norouzi

arXiv:1704.01279·cs.LG·April 6, 2017·298 cites

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas, Eck, Karen Simonyan, Mohammad Norouzi

PDF

Open Access 5 Repos 1 Datasets

TL;DR

This paper introduces a WaveNet autoencoder for high-quality musical note synthesis, leveraging a new large-scale dataset called NSynth, and demonstrates its ability to generate realistic, interpolated sounds with meaningful timbre variations.

Contribution

The paper presents a novel WaveNet autoencoder architecture and introduces NSynth, a large-scale dataset, enabling improved audio synthesis and timbre interpolation.

Findings

01

Enhanced audio quality over spectral autoencoders

02

Learned embeddings facilitate instrument morphing

03

Model generates realistic, expressive sounds

Abstract

Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

jg583/NSynth
dataset· 76 dl
76 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsMixture of Logistic Distributions · Solana Customer Service Number +1-833-534-1729 · Dilated Causal Convolution · WaveNet