Jointist: Simultaneous Improvement of Multi-instrument Transcription and   Music Source Separation via Joint Training

Kin Wai Cheuk; Keunwoo Choi; Qiuqiang Kong; Bochen Li; Minz Won,; Ju-Chiang Wang; Yun-Ning Hung; Dorien Herremans

arXiv:2302.00286·cs.SD·February 3, 2023·1 cites

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won,, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

PDF

Open Access

TL;DR

Jointist is a novel multi-instrument framework that jointly transcribes and separates instruments in music, improving performance through joint training and offering user control, with state-of-the-art results on popular music.

Contribution

It introduces a joint training approach for multi-instrument transcription and source separation, with an optional instrument recognition module for user control.

Findings

01

Achieves state-of-the-art transcription performance on popular music.

02

Improves source separation by 5 SDR points.

03

Enhances downstream tasks like downbeat detection and chord recognition.

Abstract

In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies