Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations
Gabriel Meseguer-Brocal, and Geoffroy Peeters

TL;DR
The paper introduces Conditioned-U-Net, a versatile model that uses a control mechanism to perform multiple instrument separations within a single network, matching specialized models' performance at lower computational cost.
Contribution
It proposes the Conditioned-U-Net with a control mechanism using FiLM layers, enabling a single model to perform multiple source separations efficiently.
Findings
Achieves comparable performance to dedicated models for multiple instruments.
Reduces computational cost by consolidating tasks into one model.
Demonstrates effective control via one-hot encoding and FiLM layers.
Abstract
Data-driven models for audio source separation such as U-Net or Wave-U-Net are usually models dedicated to and specifically trained for a single task, e.g. a particular instrument isolation. Training them for various tasks at once commonly results in worse performances than training them for a single specialized task. In this work, we introduce the Conditioned-U-Net (C-U-Net) which adds a control mechanism to the standard U-Net. The control mechanism allows us to train a unique and generic U-Net to perform the separation of various instruments. The C-U-Net decides the instrument to isolate according to a one-hot-encoding input vector. The input vector is embedded to obtain the parameters that control Feature-wise Linear Modulation (FiLM) layers. FiLM layers modify the U-Net feature maps in order to separate the desired instrument via affine transformations. The C-U-Net performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net
