Transcription Is All You Need: Learning to Separate Musical Mixtures   with Score as Supervision

Yun-Ning Hung; Gordon Wichern; Jonathan Le Roux

arXiv:2010.11904·cs.SD·October 23, 2020

Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision

Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux

PDF

TL;DR

This paper introduces a music source separation system trained with musical scores as weak supervision, eliminating the need for isolated sources during training and improving separation quality through adversarial training.

Contribution

It presents a novel training approach using scores as weak labels, with a joint separator and transcriptor model, and introduces adversarial losses for enhanced performance.

Findings

01

Score-based training outperforms temporal weak-labels.

02

Adversarial training improves separation and transcription accuracy.

03

The system does not require isolated sources for training.

Abstract

Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain. In this work, we use musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system. In contrast with previous score-informed separation approaches, our system does not require isolated sources, and score is used only as a training target, not required for inference. Our model consists of a separator that outputs a time-frequency mask for each instrument, and a transcriptor that acts as a critic, providing both temporal and frequency supervision to guide the learning of the separator. A harmonic mask constraint is introduced as another way of leveraging score information during training, and we propose two novel adversarial losses for additional fine-tuning of both the transcriptor and the separator. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.