Source Separation and Depthwise Separable Convolutions for Computer   Audition

Gabriel Mersy; Jin Hong Kuan

arXiv:2012.03359·cs.SD·December 8, 2020

Source Separation and Depthwise Separable Convolutions for Computer Audition

Gabriel Mersy, Jin Hong Kuan

PDF

Open Access

TL;DR

This paper introduces a method combining source separation with depthwise separable convolutions to improve music classification in machine listening, demonstrating enhanced performance on EDM data.

Contribution

It presents a novel approach that integrates source separation with depthwise separable CNNs for improved audio classification in computer audition.

Findings

01

Source separation enhances classification accuracy in limited-data scenarios.

02

Depthwise separable convolutions outperform standard CNNs on spectrograms.

03

The method is effective on challenging EDM datasets.

Abstract

Given recent advances in deep music source separation, we propose a feature representation method that combines source separation with a state-of-the-art representation learning technique that is suitably repurposed for computer audition (i.e. machine listening). We train a depthwise separable convolutional neural network on a challenging electronic dance music (EDM) data set and compare its performance to convolutional neural networks operating on both source separated and standard spectrograms. It is shown that source separation improves classification performance in a limited-data setting compared to the standard single spectrogram approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior