SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Soroush Mehri; Kundan Kumar; Ishaan Gulrajani; Rithesh Kumar; Shubham; Jain; Jose Sotelo; Aaron Courville; Yoshua Bengio

arXiv:1612.07837·cs.SD·February 14, 2017·362 cites

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham, Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

PDF

Open Access 4 Repos

TL;DR

This paper introduces SampleRNN, a hierarchical neural network model for unconditional audio generation that effectively captures long-term temporal dependencies and produces high-quality audio samples, outperforming existing models.

Contribution

The paper presents a novel hierarchical neural architecture combining memory-less modules and recurrent networks for improved audio generation.

Findings

01

Model captures long-term temporal variations effectively

02

Human evaluators prefer generated samples from this model

03

Component analysis shows each part's contribution to performance

Abstract

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies