Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support
Karn N. Watcharasupat, Chih-Wei Wu, and Iroro Orife

TL;DR
This paper introduces DnR v3, an improved multilingual cinematic audio source separation dataset, enhancing diversity and quality, and demonstrates that multilingual training improves model generalization across languages.
Contribution
The authors develop DnR v3, a significantly improved, multilingual dataset for cinematic audio source separation, and show that multilingual training enhances model performance across languages.
Findings
Multilingual training improves model generalizability across languages.
Models trained on multilingual data perform as well or better than monolingual models.
DnR v3 includes diverse languages and addresses previous dataset limitations.
Abstract
Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music, and effects stems from their mixture, is a relatively new subtask of audio source separation. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several areas of improvement have been identified, particularly through its use in the 2023 Sound Demixing Challenge. In this work, we develop version 3 of the DnR dataset, addressing issues relating to vocal content in non-dialogue stems, loudness distributions, mastering process, and linguistic diversity. In particular, the dialogue stem of DnR v3 includes speech content from more than 30 languages from multiple families including but not limited to the Germanic, Romance, Indo-Aryan, Dravidian,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Handwritten Text Recognition Techniques
