DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

Jiarui Hai; Helin Wang; Dongchao Yang; Karan Thakkar and; Najim Dehak; Mounya Elhilali

arXiv:2310.04567·eess.AS·October 11, 2023·1 cites

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

Jiarui Hai, Helin Wang, Dongchao Yang, Karan Thakkar and, Najim Dehak, Mounya Elhilali

PDF

Open Access 2 Repos

TL;DR

This paper introduces DPM-TSE, a novel generative diffusion probabilistic model for target sound extraction that improves sound quality and separation from background noise, outperforming traditional discriminative methods.

Contribution

DPM-TSE is the first generative diffusion model applied to target sound extraction, enhancing separation quality and noise handling compared to prior discriminative approaches.

Findings

01

Significant improvement in perceived sound quality.

02

Enhanced separation from background noise.

03

Effective handling of noise issues with new correction method.

Abstract

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background. This study introduces DPM-TSE, a first generative method based on diffusion probabilistic modeling (DPM) for target sound extraction, to achieve both cleaner target renderings as well as improved separability from unwanted sounds. The technique also tackles common background noise issues with DPM by introducing a correction method for noise schedules and sample steps. This approach is evaluated using both objective and subjective quality metrics on the FSD Kaggle 2018 dataset. The results show that DPM-TSE has a significant improvement in perceived quality in terms of target extraction and purity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research

MethodsDiffusion