FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier, Serra

TL;DR
FSD50K is a large, openly available dataset of over 51,000 human-labeled sound clips designed to advance sound event recognition research, addressing limitations of existing datasets like AudioSet.
Contribution
The paper introduces FSD50K, an open, freely distributable sound event dataset with detailed creation methodology and baseline classification experiments.
Findings
FSD50K contains over 51,000 clips with 200 sound classes.
Baseline sound event classification results are provided.
Discussion of dataset limitations and key factors for usage.
Abstract
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
