SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement   Challenge for Video Conferencing

R G Prithvi Raj; Rohit Kumar; M K Jayesh; Anurenjan Purushothaman,; Sriram Ganapathy; M A Basha Shaik

arXiv:2106.12763·eess.AS·June 25, 2021

SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing

R G Prithvi Raj, Rohit Kumar, M K Jayesh, Anurenjan Purushothaman,, Sriram Ganapathy, M A Basha Shaik

PDF

TL;DR

This paper introduces a two-stage multi-channel speech enhancement method for video conferencing, combining a self-attention based beamformer with CNN-LSTM single-channel enhancement, significantly improving speech quality metrics.

Contribution

The paper presents a novel two-stage approach integrating self-attention beamforming with CNN-LSTM enhancement for far-field speech in conferencing environments.

Findings

01

PESQ improved by 0.5 on noisy data

02

MOS increased by 0.9 points

03

Effective in enhancing speech quality in real scenarios

Abstract

This paper presents the details of the SRIB-LEAP submission to the ConferencingSpeech challenge 2021. The challenge involved the task of multi-channel speech enhancement to improve the quality of far field speech from microphone arrays in a video conferencing room. We propose a two stage method involving a beamformer followed by single channel enhancement. For the beamformer, we incorporated self-attention mechanism as inter-channel processing layer in the filter-and-sum network (FaSNet), an end-to-end time-domain beamforming system. The single channel speech enhancement is done in log spectral domain using convolution neural network (CNN)-long short term memory (LSTM) based architecture. We achieved improvements in objective quality metrics - perceptual evaluation of speech quality (PESQ) of 0.5 on the noisy data. On subjective quality evaluation, the proposed approach improved the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution