audb -- Sharing and Versioning of Audio and Annotation Data in Python
Hagen Wierstorf, Johannes Wagner, Florian Eyben, Felix Burkhardt,, Bj\"orn W. Schuller

TL;DR
audb is an open-source Python library that simplifies versioning, sharing, and accessing audio datasets with automatic dependency resolution, partial loading, and caching, facilitating dataset management for machine learning research.
Contribution
It introduces a standardized, lightweight Python tool for efficient dataset versioning, documentation, and sharing tailored for audio data in machine learning workflows.
Findings
Supports automatic dependency resolution for dataset versions
Enables partial dataset loading and local caching
Facilitates dataset management across individual and community scales
Abstract
Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of a dataset. To efficiently store the data on a server, audb automatically resolves dependencies between versions of a dataset and only uploads newly added or altered files when a new version is published. The library supports partial loading of a dataset and local caching for fast access. audb is a lightweight library and can be interfaced from any machine learning library. It supports the management of datasets on a single PC, within a university or company, or within a whole research community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
MethodsLib
