Comparison of Time-Frequency Representations for Environmental Sound   Classification using Convolutional Neural Networks

M. Huzaifah

arXiv:1706.07156·cs.CV·June 23, 2017·122 cites

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

M. Huzaifah

PDF

Open Access 2 Datasets

TL;DR

This paper compares various time-frequency representations like spectrograms, CQT, and CWT for environmental sound classification using CNNs, finding that the choice of transformation significantly affects accuracy, with Mel-scaled STFT performing best.

Contribution

It systematically evaluates the impact of different signal processing methods on CNN-based environmental sound classification, highlighting the importance of representation choice.

Findings

01

Mel-scaled STFT slightly outperforms other methods

02

Optimal window size depends on audio signal characteristics

03

2D convolution generally yields better results than 1D

Abstract

Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio signal, through various time-frequency representations such as spectrograms offer a rich representation of the temporal and spectral structure of the original signal. In this letter, we compare various popular signal processing methods to obtain this representation, such as short-time Fourier transform (STFT) with linear and Mel scales, constant-Q transform (CQT) and continuous Wavelet transform (CWT), and assess their impact on the classification performance of two environmental sound datasets using CNNs. This study supports the hypothesis that time-frequency representations are valuable in learning useful features for sound classification. Moreover, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsConvolution