Low-complexity CNNs for Acoustic Scene Classification
Arshdeep Singh, Mark D. Plumbley

TL;DR
This paper introduces a low-complexity CNN framework for acoustic scene classification that reduces resource requirements through pruning, quantization, and ensemble methods, achieving competitive performance on a standard dataset.
Contribution
It presents a novel low-complexity CNN architecture with pruning and quantization, combined in an ensemble to enhance acoustic scene classification performance.
Findings
Approximately 60K parameters in the ensemble model
Requires 19 million multiply-accumulate operations
Improves accuracy by 2-4 percentage points over baseline
Abstract
This paper presents a low-complexity framework for acoustic scene classification (ASC). Most of the frameworks designed for ASC use convolutional neural networks (CNNs) due to their learning ability and improved performance compared to hand-engineered features. However, CNNs are resource hungry due to their large size and high computational complexity. Therefore, CNNs are difficult to deploy on resource constrained devices. This paper addresses the problem of reducing the computational complexity and memory requirement in CNNs. We propose a low-complexity CNN architecture, and apply pruning and quantization to further reduce the parameters and memory. We then propose an ensemble framework that combines various low-complexity CNNs to improve the overall performance. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2022 Task 1 that focuses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsPruning
