SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation
Jaime Garcia-Martinez, David Diaz-Guerra, Archontis Politis, Tuomas, Virtanen, Julio J. Carabias-Orti, and Pedro Vera-Candeas

TL;DR
This paper introduces SynthSOD, a new multitrack dataset for orchestra music source separation, addressing the lack of comprehensive datasets for extracting similar-sounding orchestral sources.
Contribution
SynthSOD is a novel, realistic, and heterogeneous dataset created using simulation techniques, enabling improved training for orchestra source separation models.
Findings
Baseline model trained on SynthSOD performs well on synthetic data.
Model generalizes to real-world orchestral recordings.
SynthSOD enhances the development of source separation techniques for orchestral music.
Abstract
Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic (i.e. using high-quality soundfonts), musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions. Moreover, we demonstrate the application of a widely used baseline music separation model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training
