The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing   Track

Stefan Uhlich; Giorgio Fabbro; Masato Hirano; Shusuke Takahashi,; Gordon Wichern; Jonathan Le Roux; Dipam Chakraborty; Sharada Mohanty; Kai Li,; Yi Luo; Jianwei Yu; Rongzhi Gu; Roman Solovyev; Alexander Stempkovskiy,; Tatiana Habruseva; Mikhail Sukhovei; Yuki Mitsufuji

arXiv:2308.06981·eess.AS·April 19, 2024

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi,, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li,, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy,, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

PDF

Open Access 1 Repo

TL;DR

The Sound Demixing Challenge 2023's cinematic demixing track evaluated various approaches to separate cinematic audio sources, highlighting the effectiveness of training on simulated data and dataset realism improvements.

Contribution

This paper introduces the challenge setup, a new real-world dataset, and analyzes successful methods, emphasizing the impact of data realism on demixing performance.

Findings

01

Best system improved SDR by 1.8 dB over baseline on simulated data.

02

Top system achieved 5.7 dB SDR improvement on open leaderboard.

03

Enhancing simulated data to better match real audio significantly boosts performance.

Abstract

This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most successful approaches employed by participants. Compared to the cocktail-fork baseline, the best-performing system trained exclusively on the simulated Divide and Remaster (DnR) dataset achieved an improvement of 1.8 dB in SDR, whereas the top-performing system on the open leaderboard, where any data could be used for training, saw a significant improvement of 5.7 dB. A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merlresearch/cocktail-fork-separation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection