SoundCollage: Automated Discovery of New Classes in Audio Datasets

Ryuhaerang Choi; Soumyajit Chatterjee; Dimitris Spathis and; Sung-Ju Lee; Fahim Kawsar; Mohammad Malekzadeh

arXiv:2410.23008·cs.SD·January 22, 2025

SoundCollage: Automated Discovery of New Classes in Audio Datasets

Ryuhaerang Choi, Soumyajit Chatterjee, Dimitris Spathis and, Sung-Ju Lee, Fahim Kawsar, Mohammad Malekzadeh

PDF

Open Access 1 Repo

TL;DR

SoundCollage is a framework that automatically discovers and labels new classes within existing audio datasets, enhancing their utility for training more accurate downstream audio classifiers.

Contribution

It introduces a novel audio decomposition and automated annotation method, along with a coherence measure, to identify and validate new classes in audio datasets.

Findings

01

Improves downstream classifier accuracy by up to 34.7% within discovered classes.

02

Enhances classifier performance on held-out data by up to 4.5%.

03

Provides an open-source implementation for further research.

Abstract

Developing new machine learning applications often requires the collection of new datasets. However, existing datasets may already contain relevant information to train models for new purposes. We propose SoundCollage: a framework to discover new classes within audio datasets by incorporating (1) an audio pre-processing pipeline to decompose different sounds in audio samples, and (2) an automated model-based annotation mechanism to identify the discovered classes. Furthermore, we introduce the clarity measure to assess the coherence of the discovered classes for better training new downstream applications. Our evaluations show that the accuracy of downstream audio classifiers within discovered class samples and a held-out dataset improves over the baseline by up to 34.7% and 4.5%, respectively. These results highlight the potential of SoundCollage in making datasets reusable by labeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nokia-bell-labs/audio-class-discovery
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing