Remastering Divide and Remaster: A Cinematic Audio Source Separation   Dataset with Multilingual Support

Karn N. Watcharasupat; Chih-Wei Wu; and Iroro Orife

arXiv:2407.07275·eess.AS·August 27, 2024

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

Karn N. Watcharasupat, Chih-Wei Wu, and Iroro Orife

PDF

Open Access 1 Repo

TL;DR

This paper introduces DnR v3, an improved multilingual cinematic audio source separation dataset, enhancing diversity and quality, and demonstrates that multilingual training improves model generalization across languages.

Contribution

The authors develop DnR v3, a significantly improved, multilingual dataset for cinematic audio source separation, and show that multilingual training enhances model performance across languages.

Findings

01

Multilingual training improves model generalizability across languages.

02

Models trained on multilingual data perform as well or better than monolingual models.

03

DnR v3 includes diverse languages and addresses previous dataset limitations.

Abstract

Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music, and effects stems from their mixture, is a relatively new subtask of audio source separation. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several areas of improvement have been identified, particularly through its use in the 2023 Sound Demixing Challenge. In this work, we develop version 3 of the DnR dataset, addressing issues relating to vocal content in non-dialogue stems, loudness distributions, mastering process, and linguistic diversity. In particular, the dialogue stem of DnR v3 includes speech content from more than 30 languages from multiple families including but not limited to the Germanic, Romance, Indo-Aryan, Dravidian,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kwatcharasupat/source-separation-landing
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Digital Media Forensic Detection · Handwritten Text Recognition Techniques