Investigation of Feature Selection and Pooling Methods for Environmental Sound Classification
Parinaz Binandeh Dehaghani, Danilo Pena, A. Pedro Aguiar

TL;DR
This study evaluates feature selection and pooling strategies, especially Sparse Salient Region Pooling, for environmental sound classification using lightweight CNNs, demonstrating significant accuracy improvements over traditional methods.
Contribution
The paper introduces and assesses SSRP variants for ESC, showing their effectiveness over PCA and baseline CNNs in resource-limited environments.
Findings
SSRP-T achieves up to 80.69% accuracy on ESC-50
SSRP methods outperform PCA and baseline CNNs
Sparse pooling enhances efficiency and robustness
Abstract
This paper explores the impact of dimensionality reduction and pooling methods for Environmental Sound Classification (ESC) using lightweight CNNs. We evaluate Sparse Salient Region Pooling (SSRP) and its variants, SSRP-Basic (SSRP-B) and SSRP-Top-K (SSRP-T), under various hyperparameter settings and compare them with Principal Component Analysis (PCA). Experiments on the ESC-50 dataset demonstrate that SSRP-T achieves up to 80.69 % accuracy, significantly outperforming both the baseline CNN (66.75 %) and the PCA-reduced model (37.60 %). Our findings confirm that a well-tuned sparse pooling strategy provides a robust, efficient, and high-performing solution for ESC tasks, particularly in resource-constrained scenarios where balancing accuracy and computational cost is crucial.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Animal Vocal Communication and Behavior · Noise Effects and Management
