SoundCollage: Automated Discovery of New Classes in Audio Datasets
Ryuhaerang Choi, Soumyajit Chatterjee, Dimitris Spathis and, Sung-Ju Lee, Fahim Kawsar, Mohammad Malekzadeh

TL;DR
SoundCollage is a framework that automatically discovers and labels new classes within existing audio datasets, enhancing their utility for training more accurate downstream audio classifiers.
Contribution
It introduces a novel audio decomposition and automated annotation method, along with a coherence measure, to identify and validate new classes in audio datasets.
Findings
Improves downstream classifier accuracy by up to 34.7% within discovered classes.
Enhances classifier performance on held-out data by up to 4.5%.
Provides an open-source implementation for further research.
Abstract
Developing new machine learning applications often requires the collection of new datasets. However, existing datasets may already contain relevant information to train models for new purposes. We propose SoundCollage: a framework to discover new classes within audio datasets by incorporating (1) an audio pre-processing pipeline to decompose different sounds in audio samples, and (2) an automated model-based annotation mechanism to identify the discovered classes. Furthermore, we introduce the clarity measure to assess the coherence of the discovered classes for better training new downstream applications. Our evaluations show that the accuracy of downstream audio classifiers within discovered class samples and a held-out dataset improves over the baseline by up to 34.7% and 4.5%, respectively. These results highlight the potential of SoundCollage in making datasets reusable by labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing
