Multi-Channel Speech Enhancement using Graph Neural Networks

Panagiotis Tzirakis; Anurag Kumar; Jacob Donley

arXiv:2102.06934·cs.SD·February 16, 2021

Multi-Channel Speech Enhancement using Graph Neural Networks

Panagiotis Tzirakis, Anurag Kumar, Jacob Donley

PDF

TL;DR

This paper introduces a novel multi-channel speech enhancement method using graph neural networks to model spatial correlations among microphone signals, demonstrating superior performance over existing techniques.

Contribution

The paper proposes a new approach that applies graph neural networks to multi-channel speech enhancement, capturing spatial relationships in a non-Euclidean space.

Findings

01

Outperforms prior state-of-the-art methods

02

Effective across various microphone array configurations

03

Demonstrates robustness in simulated room acoustics

Abstract

Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.