Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training
Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won,, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

TL;DR
Jointist is a novel multi-instrument framework that jointly transcribes and separates instruments in music, improving performance through joint training and offering user control, with state-of-the-art results on popular music.
Contribution
It introduces a joint training approach for multi-instrument transcription and source separation, with an optional instrument recognition module for user control.
Findings
Achieves state-of-the-art transcription performance on popular music.
Improves source separation by 5 SDR points.
Enhances downstream tasks like downbeat detection and chord recognition.
Abstract
In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
