E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Arshdeep Singh, Haohe Liu, Mark D. Plumbley

TL;DR
This paper introduces E-PANNs, an optimized version of pre-trained audio neural networks that reduces computational and memory demands by 36% and 70% respectively, while slightly improving sound recognition accuracy.
Contribution
The paper presents a pruning-based method to significantly reduce the complexity and size of PANNs, enabling deployment on resource-constrained devices with improved performance.
Findings
E-PANNs requires 36% less computation.
E-PANNs uses 70% less memory.
E-PANNs slightly outperforms original PANNs in sound recognition.
Abstract
Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking. Recent machine learning methods, such as convolutional neural networks (CNNs), have been shown to be able to automatically recognize sound activities, a task known as audio tagging. One such method, pre-trained audio neural networks (PANNs), provides a neural network which has been pre-trained on over 500 sound classes from the publicly available AudioSet dataset, and can be used as a baseline or starting point for other tasks. However, the existing PANNs model has a high computational complexity and large storage requirement. This could limit the potential for deploying PANNs on resource-constrained devices, such as on-the-edge sound sensors, and could lead to high energy consumption if many such devices were deployed. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsPruning
