Audio-Language Datasets of Scenes and Events: A Survey

Gijs Wijngaard; Elia Formisano; Michele Esposito; Michel Dumontier

arXiv:2407.06947·cs.SD·February 10, 2025

Audio-Language Datasets of Scenes and Events: A Survey

Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This survey reviews 69 audio-language datasets, analyzing their characteristics, challenges, and opportunities to improve the development of more diverse and effective audio-language models.

Contribution

It provides a comprehensive analysis of existing datasets, evaluates their variability and biases, and discusses key challenges and opportunities for future dataset development.

Findings

01

AudioSet has over two million samples from YouTube.

02

Freesound contains over 1 million samples from community contributions.

03

Identified biases and imbalances in sound categories and language representation.

Abstract

Audio-language models (ALMs) generate linguistic descriptions of sound-producing events and scenes. Advances in dataset creation and computational power have led to significant progress in this domain. This paper surveys 69 datasets used to train ALMs, covering research up to September 2024 (https://github.com/GLJS/audio-datasets). It provides a comprehensive analysis of datasets origins, audio and linguistic characteristics, and use cases. Key sources include YouTube-based datasets like AudioSet with over two million samples, and community platforms like Freesound with over 1 million samples. Through principal component analysis of audio and text embeddings, the survey evaluates the acoustic and linguistic variability across datasets. It also analyzes data leakage through CLAP embeddings, and examines sound category distributions to identify imbalances. Finally, the survey identifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gljs/audio-datasets
pytorchOfficial

Datasets

gijs/audio-datasets
dataset· 177 dl
177 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies