SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham, Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

TL;DR
This paper introduces SampleRNN, a hierarchical neural network model for unconditional audio generation that effectively captures long-term temporal dependencies and produces high-quality audio samples, outperforming existing models.
Contribution
The paper presents a novel hierarchical neural architecture combining memory-less modules and recurrent networks for improved audio generation.
Findings
Model captures long-term temporal variations effectively
Human evaluators prefer generated samples from this model
Component analysis shows each part's contribution to performance
Abstract
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
