FSD50K-Solo: Automated Curation of Single-Source Sound Events
Ningyuan Yang, Sile Yin, Li-Chia Yang, Bryce Irvin, Xiao Quan, Marko Stamenovic, Shuo Zhang

TL;DR
This paper presents FSD50K-Solo, a method for automatically curating a large-scale, high-quality, single-source sound event dataset from open audio corpora using generative models and classifiers.
Contribution
It introduces a novel framework combining generative diffusion models and discriminative classifiers to identify and filter single-source audio samples from large datasets.
Findings
Framework achieves strong performance on a human-curated test set.
FSD50K-Solo contains high-quality single-source audio samples.
Method establishes a scalable paradigm for open-source audio data curation.
Abstract
High-quality training datasets are essential for the performance of neural networks. However, the audio domain still lacks a large-scale, strongly-labeled, and single-source sound event dataset. The FSD50K dataset, despite being relatively large and open, contains a considerable fraction of multi-source samples where background interference or overlapping events could limit the usefulness of the data. To address this challenge, we introduce a data curation framework designed for large-scale open audio corpora. Our approach leverages a generative diffusion model to synthesize clean single-class events to construct controlled noisy mixtures for supervision. We subsequently employ a pre-trained audio encoder coupled with a discriminative classifier to automatically identify and filter out multi-source samples. Experiments show that our framework achieves strong performance on a human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
