Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms   for Audio Tagging

Marcel Lederle; Benjamin Wilhelm

arXiv:1811.10708·cs.SD·November 28, 2018·6 cites

Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

Marcel Lederle, Benjamin Wilhelm

PDF

Open Access

TL;DR

This paper introduces a single-model approach that combines high-level features from raw audio and mel-spectrograms using two CNNs, achieving top performance in audio tagging tasks.

Contribution

It presents a novel method of integrating learned features from raw audio and spectrograms within a single neural network, improving audio tagging accuracy.

Findings

01

Ranks among the top 2% in the Kaggle challenge

02

Effective combination of features from different audio representations

03

Simple model with high performance

Abstract

In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority of ensemble methods combine predictions rather than learned features. We propose a single-model method that combines learned high-level features computed from log-scaled mel-spectrograms and raw audio data. These features are learned separately by two Convolutional Neural Networks, one for each input type, and then combined by densely connected layers within a single network. This relatively simple approach along with data augmentation ranks among the best two percent in the Freesound General-Purpose Audio Tagging Challenge on Kaggle.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies