Audio query-based music source separation

Jie Hwan Lee; Hyeong-Seok Choi; Kyogu Lee

arXiv:1908.06593·cs.SD·August 20, 2019·19 cites

Audio query-based music source separation

Jie Hwan Lee, Hyeong-Seok Choi, Kyogu Lee

PDF

Open Access

TL;DR

This paper introduces an audio query-based music source separation network that can separate multiple sources from a mixture using a query signal, with the ability to generate continuous outputs through latent space interpolation.

Contribution

It proposes a novel network architecture with a Query-net and Separator that explicitly encodes source information from a query, enabling flexible separation of multiple sources.

Findings

01

Successfully separates multiple sources with a single network.

02

Can generate continuous outputs via latent vector interpolation.

03

Performs well on the MUSDB18 dataset.

Abstract

In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the previous studies are restricted to separating a few limited number of sources, such as vocals, drums, bass, and other. In this study, we propose a network for audio query-based music source separation that can explicitly encode the source information from a query signal regardless of the number and/or kind of target signals. The proposed method consists of a Query-net and a Separator: given a query and a mixture, the Query-net encodes the query into the latent space, and the Separator estimates masks conditioned by the latent vector, which is then applied to the mixture for separation. The Separator can also generate masks using the latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis