Inter-channel Conv-TasNet for multichannel speech enhancement

Dongheon Lee; Seongrae Kim; and Jung-Woo Choi

arXiv:2111.04312·eess.AS·October 28, 2024·1 cites

Inter-channel Conv-TasNet for multichannel speech enhancement

Dongheon Lee, Seongrae Kim, and Jung-Woo Choi

PDF

Open Access

TL;DR

This paper introduces an advanced multichannel speech enhancement network based on Conv-TasNet, which effectively exploits inter-channel relationships and spatial information to significantly improve speech quality and noise suppression.

Contribution

It extends Conv-TasNet into a multichannel framework that fully utilizes inter-channel relationships and spatial information, outperforming existing models with fewer parameters.

Findings

01

Outperforms state-of-the-art multichannel neural networks

02

Uses fewer parameters while achieving better enhancement

03

Significant improvements in SDR, PESQ, and STOI on CHiME-3

Abstract

Speech enhancement in multichannel settings has been realized by utilizing the spatial information embedded in multiple microphone signals. Moreover, deep neural networks (DNNs) have been recently advanced in this field; however, studies on the efficient multichannel network structure fully exploiting spatial information and inter-channel relationships is still in its early stages. In this study, we propose an end-to-end time-domain speech enhancement network that can facilitate the use of inter-channel relationships at individual layers of a DNN. The proposed technique is based on a fully convolutional time-domain audio separation network (Conv-TasNet), originally developed for speech separation tasks. We extend Conv-TasNet into several forms that can handle multichannel input signals and learn inter-channel relationships. To this end, we modify the encoder-mask-decoder structures of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution