PodcastMix: A dataset for separating music and speech in podcasts

Nicol\'as Schmidt; Jordi Pons; Marius Miron

arXiv:2207.07403·cs.SD·July 18, 2022

PodcastMix: A dataset for separating music and speech in podcasts

Nicol\'as Schmidt, Jordi Pons, Marius Miron

PDF

Open Access 1 Repo

TL;DR

PodcastMix provides a new dataset and benchmark for separating music and speech in podcasts, highlighting current deep learning models' generalization challenges and demonstrating promising separation quality.

Contribution

We introduce PodcastMix, a large dataset and benchmark for music and speech separation in podcasts, including synthetic training data and real podcast evaluation sets.

Findings

01

Deep learning models show generalization issues on real podcasts.

02

The best model achieves an overall separation quality score of 3.84.

03

Dataset and baselines are publicly available.

Abstract

We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. We aim at defining a benchmark suitable for training and evaluating (deep learning) source separation models. To that end, we release a large and diverse training dataset based on programatically generated podcasts. However, current (deep learning) models can incur into generalization issues, specially when trained on synthetic data. To target potential generalization issues, we release an evaluation set based on real podcasts for which we design objective and subjective tests. Out of our experiments with real podcasts, we find that current (deep learning) models may have generalization issues. Yet, these can perform competently, e.g., our best baseline separates speech with a mean opinion score of 3.84 (rating "overall separation quality" from 1 to 5). The dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mtg/podcastmix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadio, Podcasts, and Digital Media · Music and Audio Processing · Speech and Audio Processing