Adversarial Feature-Mapping for Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang

TL;DR
This paper introduces an adversarial feature-mapping approach for speech enhancement that uses a discriminator network to improve the quality of enhanced speech features, leading to better ASR performance.
Contribution
It proposes an adversarial training framework for speech enhancement and extends it with senone-aware training for improved ASR results.
Findings
Achieves 16.95% relative WER reduction over noisy data.
Outperforms baseline feature-mapping by 5.27%.
SA-AFM further improves WER by 9.85%.
Abstract
Feature-mapping with deep neural networks is commonly used for single-channel speech enhancement, in which a feature-mapping network directly transforms the noisy features to the corresponding enhanced ones and is trained to minimize the mean square errors between the enhanced and clean features. In this paper, we propose an adversarial feature-mapping (AFM) method for speech enhancement which advances the feature-mapping approach with adversarial learning. An additional discriminator network is introduced to distinguish the enhanced features from the real clean ones. The two networks are jointly optimized to minimize the feature-mapping loss and simultaneously mini-maximize the discrimination loss. The distribution of the enhanced features is further pushed towards that of the clean features through this adversarial multi-task training. To achieve better performance on ASR task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
