Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
Monika Doerfler, Thomas Grill, Roswitha Bammer, Arthur Flexer

TL;DR
This paper investigates whether applying adaptive or learned filters directly to raw music data can outperform traditional spectrogram-based features in CNNs, showing that adaptive filters can improve singing voice detection.
Contribution
It provides a theoretical and experimental comparison between traditional spectrogram features and adaptive filter banks for CNN-based music classification.
Findings
Adaptive filter banks can approximate mel-spectrograms.
Adaptive features outperform traditional spectrograms in singing voice detection.
Learned filter parameters perform as well as fixed adaptive filters.
Abstract
When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which are then used as input to the actual neural network. In this contribution, we investigate, both theoretically and experimentally, the influence of this pre-processing step on the network's performance and pose the question, whether replacing it by applying adaptive or learned filters directly to the raw data, can improve learning success. The theoretical results show that approximately reproducing mel-spectrogram coefficients by applying adaptive filters and subsequent time-averaging is in principle possible. We also conducted extensive experimental work on the task of singing voice detection in music. The results of these experiments show that for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Music and Audio Processing · Speech and Audio Processing
