FSD50K-Solo: Automated Curation of Single-Source Sound Events

Ningyuan Yang; Sile Yin; Li-Chia Yang; Bryce Irvin; Xiao Quan; Marko Stamenovic; Shuo Zhang

arXiv:2605.13931·eess.AS·May 19, 2026

FSD50K-Solo: Automated Curation of Single-Source Sound Events

Ningyuan Yang, Sile Yin, Li-Chia Yang, Bryce Irvin, Xiao Quan, Marko Stamenovic, Shuo Zhang

PDF

TL;DR

This paper presents FSD50K-Solo, a method for automatically curating a large-scale, high-quality, single-source sound event dataset from open audio corpora using generative models and classifiers.

Contribution

It introduces a novel framework combining generative diffusion models and discriminative classifiers to identify and filter single-source audio samples from large datasets.

Findings

01

Framework achieves strong performance on a human-curated test set.

02

FSD50K-Solo contains high-quality single-source audio samples.

03

Method establishes a scalable paradigm for open-source audio data curation.

Abstract

High-quality training datasets are essential for the performance of neural networks. However, the audio domain still lacks a large-scale, strongly-labeled, and single-source sound event dataset. The FSD50K dataset, despite being relatively large and open, contains a considerable fraction of multi-source samples where background interference or overlapping events could limit the usefulness of the data. To address this challenge, we introduce a data curation framework designed for large-scale open audio corpora. Our approach leverages a generative diffusion model to synthesize clean single-class events to construct controlled noisy mixtures for supervision. We subsequently employ a pre-trained audio encoder coupled with a discriminative classifier to automatically identify and filter out multi-source samples. Experiments show that our framework achieves strong performance on a human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.