Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network

Sania Gul; Muhammad Salman Khan; Ata Ur-Rehman

PMC · DOI:10.1371/journal.pone.0301692·July 16, 2024

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network

Sania Gul, Muhammad Salman Khan, Ata Ur-Rehman

PDF

Open Access

TL;DR

This paper introduces a new method for speech enhancement that uses a pre-trained network without any fine-tuning to clean up noisy and reverberated speech.

Contribution

The novel contribution is a zero-shot speech denoising and dereverberation method using a frozen pre-trained network without architectural changes or fine-tuning.

Findings

01

The frozen network performs well in zero-shot testing under noisy and reverberant conditions without prior exposure.

02

The model outperforms the WPE algorithm in terms of speech quality and intelligibility metrics on the same training dataset.

03

The proposed model achieves comparable performance to deep learning SD&D algorithms under varying noise and reverberation conditions.

Abstract

Speech enhancement is crucial both for human and machine listening applications. Over the last decade, the use of deep learning for speech enhancement has resulted in tremendous improvement over the classical signal processing and machine learning methods. However, training a deep neural network is not only time-consuming; it also requires extensive computational resources and a large training dataset. Transfer learning, i.e. using a pretrained network for a new task, comes to the rescue by reducing the amount of training time, computational resources, and the required dataset, but the network still needs to be fine-tuned for the new task. This paper presents a novel method of speech denoising and dereverberation (SD&D) on an end-to-end frozen binaural anechoic speech separation network. The frozen network requires neither any architectural change nor any fine-tuning for the new task,…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures17

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing