TSE-PI: Target Sound Extraction under Reverberant Environments with   Pitch Information

Yiwen Wang; Xihong Wu

arXiv:2406.08716·cs.SD·June 14, 2024

TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information

Yiwen Wang, Xihong Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces TSE-PI, a novel target sound extraction model that leverages pitch information and a Gammatone filterbank to significantly improve performance in reverberant environments, inspired by auditory scene analysis.

Contribution

The paper proposes a new TSE model that integrates pitch cues and a Gammatone filterbank, enhancing extraction accuracy under reverberation compared to existing methods.

Findings

01

Achieves 2.4 dB improvement in target sound extraction in reverberant environments.

02

Utilizes pitch information and Gammatone filterbank to enhance model robustness.

03

Demonstrates effectiveness on the FSD50K dataset.

Abstract

Target sound extraction (TSE) separates the target sound from the mixture signals based on provided clues. However, the performance of existing models significantly degrades under reverberant conditions. Inspired by auditory scene analysis (ASA), this work proposes a TSE model provided with pitch information named TSE-PI. Conditional pitch extraction is achieved through the Feature-wise Linearly Modulated layer with the sound-class label. A modified Waveformer model combined with pitch information, employing a learnable Gammatone filterbank in place of the convolutional encoder, is used for target sound extraction. The inclusion of pitch information is aimed at improving the model's performance. The experimental results on the FSD50K dataset illustrate 2.4 dB improvements of target sound extraction under reverberant environments when incorporating pitch information and Gammatone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyw97/TSE_PI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing