Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for   Multiple Source Separations

Gabriel Meseguer-Brocal; and Geoffroy Peeters

arXiv:1907.01277·eess.AS·November 22, 2019·37 cites

Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations

Gabriel Meseguer-Brocal, and Geoffroy Peeters

PDF

Open Access 2 Repos

TL;DR

The paper introduces Conditioned-U-Net, a versatile model that uses a control mechanism to perform multiple instrument separations within a single network, matching specialized models' performance at lower computational cost.

Contribution

It proposes the Conditioned-U-Net with a control mechanism using FiLM layers, enabling a single model to perform multiple source separations efficiently.

Findings

01

Achieves comparable performance to dedicated models for multiple instruments.

02

Reduces computational cost by consolidating tasks into one model.

03

Demonstrates effective control via one-hot encoding and FiLM layers.

Abstract

Data-driven models for audio source separation such as U-Net or Wave-U-Net are usually models dedicated to and specifically trained for a single task, e.g. a particular instrument isolation. Training them for various tasks at once commonly results in worse performances than training them for a single specialized task. In this work, we introduce the Conditioned-U-Net (C-U-Net) which adds a control mechanism to the standard U-Net. The control mechanism allows us to train a unique and generic U-Net to perform the separation of various instruments. The C-U-Net decides the instrument to isolate according to a one-hot-encoding input vector. The input vector is embedded to obtain the parameters that control Feature-wise Linear Modulation (FiLM) layers. FiLM layers modify the U-Net feature maps in order to separate the desired instrument via affine transformations. The C-U-Net performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net