BERT for Joint Multichannel Speech Dereverberation with Spatial-aware   Tasks

Yang Jiao

arXiv:2010.10892·eess.AS·October 23, 2020

BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks

Yang Jiao

PDF

Open Access

TL;DR

This paper introduces a BERT-based neural network model for joint multichannel speech dereverberation, DOA estimation, and speech separation, leveraging sequence modeling capabilities for improved speech enhancement.

Contribution

It presents a novel supervised transformer-based approach that encodes spectral magnitude and phase for multiple tasks in a unified framework, enhancing speech dereverberation and spatial awareness.

Findings

01

Effective in joint dereverberation and spatial tasks

02

Improves speech separation accuracy

03

Demonstrates robustness across varied utterance lengths

Abstract

We propose a method for joint multichannel speech dereverberation with two spatial-aware tasks: direction-of-arrival (DOA) estimation and speech separation. The proposed method addresses involved tasks as a sequence to sequence mapping problem, which is general enough for a variety of front-end speech enhancement tasks. The proposed method is inspired by the excellent sequence modeling capability of bidirectional encoder representation from transformers (BERT). Instead of utilizing explicit representations from pretraining in a self-supervised manner, we utilizes transformer encoded hidden representations in a supervised manner. Both multichannel spectral magnitude and spectral phase information of varying length utterances are encoded. Experimental result demonstrates the effectiveness of the proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques