Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
M. Huzaifah

TL;DR
This paper compares various time-frequency representations like spectrograms, CQT, and CWT for environmental sound classification using CNNs, finding that the choice of transformation significantly affects accuracy, with Mel-scaled STFT performing best.
Contribution
It systematically evaluates the impact of different signal processing methods on CNN-based environmental sound classification, highlighting the importance of representation choice.
Findings
Mel-scaled STFT slightly outperforms other methods
Optimal window size depends on audio signal characteristics
2D convolution generally yields better results than 1D
Abstract
Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio signal, through various time-frequency representations such as spectrograms offer a rich representation of the temporal and spectral structure of the original signal. In this letter, we compare various popular signal processing methods to obtain this representation, such as short-time Fourier transform (STFT) with linear and Mel scales, constant-Q transform (CQT) and continuous Wavelet transform (CWT), and assess their impact on the classification performance of two environmental sound datasets using CNNs. This study supports the hypothesis that time-frequency representations are valuable in learning useful features for sound classification. Moreover, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsConvolution
