Simultaneous Separation and Transcription of Mixtures with Multiple   Polyphonic and Percussive Instruments

Ethan Manilow; Prem Seetharaman; Bryan Pardo

arXiv:1910.12621·eess.AS·February 14, 2020

Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments

Ethan Manilow, Prem Seetharaman, Bryan Pardo

PDF

TL;DR

This paper introduces Cerberus, a deep learning model that simultaneously separates musical mixtures into individual instruments and transcribes them, improving performance on both tasks through joint learning.

Contribution

It presents a novel multi-task architecture that combines separation and transcription in a single network, enhancing accuracy and generalization for polyphonic music analysis.

Findings

01

Joint training improves separation and transcription accuracy.

02

Cerberus outperforms separate models on unseen mixtures.

03

The shared representation benefits both tasks.

Abstract

We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. This novel architecture, which we call Cerberus, builds on the Chimera network for source separation by adding a third "head" for transcription. By training each head with different losses, we are able to jointly learn how to separate and transcribe up to 5 instruments in our experiments with a single network. We show that the two tasks are highly complementary with one another and when learned jointly, lead to Cerberus networks that are better at both separation and transcription and generalize better to unseen mixtures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.