Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE   2024 Challenge

Florian Schmid; Paul Primus; Toni Heittola; Annamaria Mesaros; Irene; Mart\'in-Morat\'o; Khaled Koutini; Gerhard Widmer

arXiv:2405.10018·eess.AS·July 19, 2024

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Florian Schmid, Paul Primus, Toni Heittola, Annamaria Mesaros, Irene, Mart\'in-Morat\'o, Khaled Koutini, Gerhard Widmer

PDF

Open Access 1 Repo

TL;DR

This paper presents a challenge for developing data-efficient, low-complexity acoustic scene classifiers, introducing a new real-world scenario with limited training data, and reports on multiple systems outperforming the baseline.

Contribution

It introduces a new challenge setup for acoustic scene classification focusing on data efficiency and low complexity, with a baseline system and evaluation of multiple submissions.

Findings

01

Most submitted systems outperform the baseline.

02

Top system achieves up to 61.8% accuracy.

03

Significant relative improvements over baseline.

Abstract

This article describes the Data-Efficient Low-Complexity Acoustic Scene Classification Task in the DCASE 2024 Challenge and the corresponding baseline system. The task setup is a continuation of previous editions (2022 and 2023), which focused on recording device mismatches and low-complexity constraints. This year's edition introduces an additional real-world problem: participants must develop data-efficient systems for five scenarios, which progressively limit the available training data. The provided baseline system is based on an efficient, factorized CNN architecture constructed from inverted residual blocks and uses Freq-MixStyle to tackle the device mismatch problem. The task received 37 submissions from 17 teams, with the large majority of systems outperforming the baseline. The top-ranked system's accuracy ranges from 54.3% on the smallest to 61.8% on the largest subset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CPJKU/dcase2024_task1_baseline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis