# Semi-Supervised Monaural Singing Voice Separation With a Masking Network   Trained on Synthetic Mixtures

**Authors:** Michael Michelashvili, Sagie Benaim, Lior Wolf

arXiv: 1812.06087 · 2019-05-07

## TL;DR

This paper introduces a semi-supervised approach for monaural singing voice separation using a masking network trained on synthetic mixtures, achieving comparable or superior results to fully supervised methods.

## Contribution

The method employs a single mapping function trained solely on instrumental and synthetic mixed samples, eliminating the need for unmixed singing data.

## Key findings

- Performs on par or better than fully supervised methods
- Outperforms recent semi-supervised approaches
- Effective with only instrumental and synthetic data

## Abstract

We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music. Our solution employs a single mapping function g, which, applied to a mixed sample, recovers the underlying instrumental music, and, applied to an instrumental sample, returns the same sample. The network g is trained using purely instrumental samples, as well as on synthetic mixed samples that are created by mixing reconstructed singing voices with random instrumental samples. Our results indicate that we are on a par with or better than fully supervised methods, which are also provided with training samples of unmixed singing voices, and are better than other recent semi-supervised methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.06087/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1812.06087/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1812.06087/full.md

---
Source: https://tomesphere.com/paper/1812.06087