Self-Attention Generative Adversarial Network for Speech Enhancement

Huy Phan; Huy Le Nguyen; Oliver Y. Ch\'en; Philipp Koch; Ngoc Q. K.; Duong; Ian McLoughlin; Alfred Mertins

arXiv:2010.09132·cs.SD·February 9, 2021

Self-Attention Generative Adversarial Network for Speech Enhancement

Huy Phan, Huy Le Nguyen, Oliver Y. Ch\'en, Philipp Koch, Ngoc Q. K., Duong, Ian McLoughlin, Alfred Mertins

PDF

1 Repo

TL;DR

This paper introduces a self-attention mechanism into speech enhancement GANs to better capture temporal dependencies, resulting in improved performance without significant computational overhead.

Contribution

It adapts non-local self-attention to GANs for speech enhancement and studies its placement effects, demonstrating consistent performance gains.

Findings

01

Self-attention improves speech enhancement quality.

02

Placement of self-attention at high-level layers is most efficient.

03

Performance gains are consistent across evaluation metrics.

Abstract

Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pquochuy/sasegan
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution