Binaural Speech Enhancement Using Deep Complex Convolutional Transformer   Networks

Vikas Tokala; Eric Grinstein; Mike Brookes; Simon Doclo; Jesper; Jensen; Patrick A. Naylor

arXiv:2403.05393·eess.AS·March 11, 2024·ICASSP·2 cites

Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper, Jensen, Patrick A. Naylor

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep complex convolutional transformer network for binaural speech enhancement, improving speech intelligibility and spatial cue preservation in noisy environments for assistive listening devices.

Contribution

It proposes a novel neural network architecture combining complex convolutional and transformer modules for binaural speech enhancement, with a specialized loss function for spatial cue preservation.

Findings

01

Enhanced speech intelligibility in simulated noisy scenarios.

02

Better preservation of binaural cues compared to baseline methods.

03

Effective noise reduction across various noise types.

Abstract

Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness. This paper presents a binaural speech enhancement method using a complex convolutional neural network with an encoder-decoder architecture and a complex multi-head attention transformer. The model is trained to estimate individual complex ratio masks in the time-frequency domain for the left and right-ear channels of binaural hearing devices. The model is trained using a novel loss function that incorporates the preservation of spatial information along with speech intelligibility improvement and noise reduction. Simulation results for acoustic scenarios with a single target speaker and isotropic noise of various types show that the proposed method improves the estimated binaural speech intelligibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vikastokala/bcctn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention