Low-complexity acoustic scene classification in DCASE 2022 Challenge
Irene Mart\'in-Morat\'o, Francesco Paissan, Alberto Ancilotto, Toni, Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas, Virtanen

TL;DR
This paper analyzes low-complexity acoustic scene classification in the DCASE 2022 Challenge, focusing on model size, computational constraints, and performance of various submissions.
Contribution
It provides an overview of the challenge's low-complexity constraints, baseline system design, and comparative analysis of submitted models.
Findings
Most submissions outperformed the baseline system.
Top systems achieved higher accuracy with similar or fewer parameters.
The challenge demonstrated effective low-complexity acoustic scene classification methods.
Abstract
This paper presents an analysis of the Low-Complexity Acoustic Scene Classification task in DCASE 2022 Challenge. The task was a continuation from the previous years, but the low-complexity requirements were changed to the following: the maximum number of allowed parameters, including the zero-valued ones, was 128 K, with parameters being represented using INT8 numerical format; and the maximum number of multiply-accumulate operations at inference time was 30 million. The provided baseline system is a convolutional neural network which employs post-training quantization of parameters, resulting in 46.5 K parameters, and 29.23 million multiply-and-accumulate operations (MMACs). Its performance on the evaluation data is 44.2% accuracy and 1.532 log-loss. In comparison, the top system in the challenge obtained an accuracy of 59.6% and a log loss of 1.091, having 121 K parameters and 28…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
