Data-Efficient Low-Complexity Acoustic Scene Classification via   Distilling and Progressive Pruning

Bing Han; Wen Huang; Zhengyang Chen; Anbai Jiang; Pingyi Fan; Cheng; Lu; Zhiqiang Lv; Jia Liu; Wei-Qiang Zhang; Yanmin Qian

arXiv:2410.20775·cs.SD·May 8, 2025

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng, Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian

PDF

Open Access

TL;DR

This paper introduces a data-efficient, low-complexity acoustic scene classification system using a new architecture, knowledge distillation, and progressive pruning, achieving state-of-the-art results and winning the DCASE2024 Challenge.

Contribution

The paper presents Rep-Mobile, a novel low-complexity architecture, combined with effective training strategies including knowledge distillation and progressive pruning, for improved acoustic scene classification.

Findings

01

Achieves state-of-the-art performance on TAU dataset.

02

Wins first place in the DCASE2024 Challenge.

03

Demonstrates improved data efficiency and reduced computational complexity.

Abstract

The goal of the acoustic scene classification (ASC) task is to classify recordings into one of the predefined acoustic scene classes. However, in real-world scenarios, ASC systems often encounter challenges such as recording device mismatch, low-complexity constraints, and the limited availability of labeled data. To alleviate these issues, in this paper, a data-efficient and low-complexity ASC system is built with a new model architecture and better training strategies. Specifically, we firstly design a new low-complexity architecture named Rep-Mobile by integrating multi-convolution branches which can be reparameterized at inference. Compared to other models, it achieves better performance and less computational complexity. Then we apply the knowledge distillation strategy and provide a comparison of the data efficiency of the teacher model with different architectures. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis