Comparing Time and Frequency Domain for Audio Event Recognition Using Deep Learning
Lars Hertel, Huy Phan, Alfred Mertins

TL;DR
This study compares deep learning-based audio event recognition in time and frequency domains, finding frequency domain features more effective and that convolutional layers enhance performance, achieving state-of-the-art results.
Contribution
It demonstrates that frequency domain features outperform time domain features for deep learning audio recognition and highlights the benefit of convolutional layers.
Findings
Frequency domain features lead to better recognition accuracy.
Convolutional and pooling layers significantly improve performance.
Achieved state-of-the-art results on benchmark datasets.
Abstract
Recognizing acoustic events is an intricate problem for a machine and an emerging field of research. Deep neural networks achieve convincing results and are currently the state-of-the-art approach for many tasks. One advantage is their implicit feature learning, opposite to an explicit feature extraction of the input signal. In this work, we analyzed whether more discriminative features can be learned from either the time-domain or the frequency-domain representation of the audio signal. For this purpose, we trained multiple deep networks with different architectures on the Freiburg-106 and ESC-10 datasets. Our results show that feature learning from the frequency domain is superior to the time domain. Moreover, additionally using convolution and pooling layers, to explore local structures of the audio signal, significantly improves the recognition performance and achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
