Leveraging Self-supervised Audio Representations for Data-Efficient   Acoustic Scene Classification

Yiqiang Cai; Shengchen Li; Xi Shao

arXiv:2408.14862·cs.SD·August 28, 2024

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Yiqiang Cai, Shengchen Li, Xi Shao

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that self-supervised audio representations, combined with ensembling and knowledge distillation, enable data-efficient and accurate acoustic scene classification with limited labeled data.

Contribution

It introduces a novel ASC system leveraging SSL audio representations, ensembling, and knowledge distillation for improved accuracy and efficiency.

Findings

01

Achieved 56.7% average accuracy in ASC.

02

SSL representations significantly improve performance with limited labeled data.

03

Ensembling and knowledge distillation further enhance accuracy.

Abstract

Acoustic scene classification (ASC) predominantly relies on supervised approaches. However, acquiring labeled data for training ASC models is often costly and time-consuming. Recently, self-supervised learning (SSL) has emerged as a powerful method for extracting features from unlabeled audio data, benefiting many downstream audio tasks. This paper proposes a data-efficient and low-complexity ASC system by leveraging self-supervised audio representations extracted from general-purpose audio datasets. We introduce BEATs, an audio SSL pre-trained model, to extract the general representations from AudioSet. Through extensive experiments, it has been demonstrated that the self-supervised audio representations can help to achieve high ASC accuracy with limited labeled fine-tuning data. Furthermore, we find that ensembling the SSL models fine-tuned with different strategies contributes to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yqcai888/easy_dcase_task1
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis