One Billion Audio Sounds from GPU-enabled Modular Synthesis
Joseph Turian, Jordie Shier, George Tzanetakis, Kirk McNally, and Max Henry

TL;DR
This paper introduces synth1B1, a massive dataset of 1 billion synthesized sounds with associated parameters, generated efficiently by a GPU-accelerated modular synthesizer, and demonstrates new evaluation and optimization methods for audio synthesis.
Contribution
The paper presents synth1B1, the largest synthesized audio dataset to date, along with torchsynth, an open-source GPU-based synthesizer, and new evaluation criteria and hyperparameter optimization techniques.
Findings
Synth1B1 is 100x larger than existing audio datasets.
Torchsynth generates samples 16200x faster than real-time on a GPU.
New rank-based evaluation criteria improve audio representation assessment.
Abstract
We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
