The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

TL;DR
This paper introduces the cocktail fork problem, aiming to separate audio into speech, music, and sound effects, and provides a new dataset and model benchmarks for this underexplored task.
Contribution
It formalizes the cocktail fork problem, creates the Divide and Remaster dataset, and proposes a multi-resolution model for improved three-source audio separation.
Findings
Best model achieves over 11 dB SI-SDR improvement for each source
Benchmark results establish baseline performance for the task
Dataset enables future research in three-source separation
Abstract
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research. Recent efforts have mainly focused on separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. However, separating an audio mixture (e.g., movie soundtrack) into the three broad categories of speech, music, and sound effects (understood to include ambient noise and natural sound events) has been left largely unexplored, despite a wide range of potential applications. This paper formalizes this task as the cocktail fork problem, and presents the Divide and Remaster (DnR) dataset to foster research on this topic. DnR is built from three well-established audio datasets (LibriSpeech, FMA, FSD50k), taking care to reproduce conditions similar to professionally produced content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques
