Audio Source Separation Using a Deep Autoencoder
Giljin Jang, Han-Gyu Kim, Yung-Hwan Oh

TL;DR
This paper introduces a deep autoencoder-based unsupervised method for audio source separation, leveraging clustering of code layer activations to segregate and reconstruct unknown source signals from mixed audio inputs.
Contribution
It presents a novel unsupervised framework that uses deep autoencoders and clustering in the code layer for separating unknown audio sources.
Findings
Promising separation results demonstrated on mixed audio signals.
Autoencoder weight analysis reveals primitive frequency components.
Clustering effectively segregates different source signals.
Abstract
This paper proposes a novel framework for unsupervised audio source separation using a deep autoencoder. The characteristics of unknown source signals mixed in the mixed input is automatically by properly configured autoencoders implemented by a network with many layers, and separated by clustering the coefficient vectors in the code layer. By investigating the weight vectors to the final target, representation layer, the primitive components of the audio signals in the frequency domain are observed. By clustering the activation coefficients in the code layer, the previously unknown source signals are segregated. The original source sounds are then separated and reconstructed by using code vectors which belong to different clusters. The restored sounds are not perfect but yield promising results for the possibility in the success of many practical applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
