Soundata: A Python library for reproducible use of audio datasets

Magdalena Fuentes; Justin Salamon; Pablo Zinemanas; Mart\'in Rocamora,; Gen\'is Paja; Ir\'an R. Rom\'an; Marius Miron; Xavier Serra; Juan Pablo; Bello

arXiv:2109.12690·cs.SD·October 5, 2021

Soundata: A Python library for reproducible use of audio datasets

Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Mart\'in Rocamora,, Gen\'is Paja, Ir\'an R. Rom\'an, Marius Miron, Xavier Serra, Juan Pablo, Bello

PDF

Open Access

TL;DR

Soundata is a Python library that standardizes loading, validation, and reproducibility of diverse audio datasets, streamlining research workflows and reducing custom code.

Contribution

It introduces a unified, easy-to-use library for handling various audio datasets, enhancing reproducibility and standardization in audio research.

Findings

01

Enables quick dataset download and standardized loading

02

Provides tools for dataset validation against canonical versions

03

Improves reproducibility and reduces custom loader code

Abstract

Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, validate that the dataset is complete and correct, and more. Soundata is based and inspired on mirdata and design to complement mirdata by working with environmental sound, bioacoustic and speech datasets, among others. Soundata was created to be easy to use, easy to contribute to, and to increase reproducibility and standardize usage of sound datasets in a flexible way.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies