Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics
Syu-Siang Wang, Jia-Yang Chen, Bo-Ren Bai, Shih-Hau Fang, Yu Tsao

TL;DR
This paper introduces HL-StarGAN, a novel unsupervised face-masked speech enhancement method that incorporates human-in-the-loop assessment metrics, improving speech quality in masked communication scenarios.
Contribution
The paper presents a new face-masked speech enhancement model with a human-in-the-loop metric predictor, trained on a curated database, outperforming existing methods in quality prediction and speech enhancement.
Findings
MaskQSS accurately predicts face-masked speech quality.
HL-StarGAN outperforms conventional StarGAN and CycleGAN in speech enhancement.
The method effectively improves speech quality in face-masked scenarios.
Abstract
The utilization of face masks is an essential healthcare measure, particularly during times of pandemics, yet it can present challenges in communication in our daily lives. To address this problem, we propose a novel approach known as the human-in-the-loop StarGAN (HL-StarGAN) face-masked speech enhancement method. HL-StarGAN comprises discriminator, classifier, metric assessment predictor, and generator that leverages an attention mechanism. The metric assessment predictor, referred to as MaskQSS, incorporates human participants in its development and serves as a "human-in-the-loop" module during the learning process of HL-StarGAN. The overall HL-StarGAN model was trained using an unsupervised learning strategy that simultaneously focuses on the reconstruction of the original clean speech and the optimization of human perception. To implement HL-StarGAN, we curated a face-masked speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Infant Health and Development
MethodsSoftmax · Attention Is All You Need
