A Comparative Study on Approaches to Acoustic Scene Classification using CNNs
Ishrat Jahan Ananya, Sarah Suad, Shadab Hafiz Choudhury, Mohammad, Ashrafuzzaman Khan

TL;DR
This paper compares different sound representations like spectrograms, MFCCs, and embeddings for acoustic scene classification using CNNs, finding spectrograms yield the best accuracy across indoor and outdoor environments.
Contribution
It systematically evaluates the impact of various sound representations on CNN-based acoustic scene classification accuracy, providing insights and guidelines for improved performance.
Findings
Spectrograms outperform MFCCs and embeddings in classification accuracy.
CNNs trained on spectrograms achieve the highest environment recognition rates.
MFCCs result in the lowest classification accuracy among the tested representations.
Abstract
Acoustic scene classification is a process of characterizing and classifying the environments from sound recordings. The first step is to generate features (representations) from the recorded sound and then classify the background environments. However, different kinds of representations have dramatic effects on the accuracy of the classification. In this paper, we explored the three such representations on classification accuracy using neural networks. We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders. Our dataset consists of sounds from three settings of indoors and outdoors environments - thus the dataset contains sound from six different kinds of environments. We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy. We reported our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
