Source Separation and Depthwise Separable Convolutions for Computer Audition
Gabriel Mersy, Jin Hong Kuan

TL;DR
This paper introduces a method combining source separation with depthwise separable convolutions to improve music classification in machine listening, demonstrating enhanced performance on EDM data.
Contribution
It presents a novel approach that integrates source separation with depthwise separable CNNs for improved audio classification in computer audition.
Findings
Source separation enhances classification accuracy in limited-data scenarios.
Depthwise separable convolutions outperform standard CNNs on spectrograms.
The method is effective on challenging EDM datasets.
Abstract
Given recent advances in deep music source separation, we propose a feature representation method that combines source separation with a state-of-the-art representation learning technique that is suitably repurposed for computer audition (i.e. machine listening). We train a depthwise separable convolutional neural network on a challenging electronic dance music (EDM) data set and compare its performance to convolutional neural networks operating on both source separated and standard spectrograms. It is shown that source separation improves classification performance in a limited-data setting compared to the standard single spectrogram approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior
