Basic Filters for Convolutional Neural Networks Applied to Music:   Training or Design?

Monika Doerfler; Thomas Grill; Roswitha Bammer; Arthur Flexer

arXiv:1709.02291·cs.LG·September 20, 2018·1 cites

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

Monika Doerfler, Thomas Grill, Roswitha Bammer, Arthur Flexer

PDF

Open Access

TL;DR

This paper investigates whether applying adaptive or learned filters directly to raw music data can outperform traditional spectrogram-based features in CNNs, showing that adaptive filters can improve singing voice detection.

Contribution

It provides a theoretical and experimental comparison between traditional spectrogram features and adaptive filter banks for CNN-based music classification.

Findings

01

Adaptive filter banks can approximate mel-spectrograms.

02

Adaptive features outperform traditional spectrograms in singing voice detection.

03

Learned filter parameters perform as well as fixed adaptive filters.

Abstract

When convolutional neural networks are used to tackle learning problems based on music or, more generally, time series data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which are then used as input to the actual neural network. In this contribution, we investigate, both theoretically and experimentally, the influence of this pre-processing step on the network's performance and pose the question, whether replacing it by applying adaptive or learned filters directly to the raw data, can improve learning success. The theoretical results show that approximately reproducing mel-spectrogram coefficients by applying adaptive filters and subsequent time-averaging is in principle possible. We also conducted extensive experimental work on the task of singing voice detection in music. The results of these experiments show that for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Music and Audio Processing · Speech and Audio Processing