Improving Neural Silent Speech Interface Models by Adversarial Training
Amin Honarmandi Shandiz, L\'aszl\'o T\'oth, G\'abor Gosztolya,, Alexandra Mark\'o, Tam\'as G\'abor Csap\'o

TL;DR
This paper explores the use of adversarial training with GANs to enhance neural silent speech interfaces, showing consistent improvements in speech quality metrics over traditional methods.
Contribution
It introduces an adversarial training framework for articulatory-to-acoustic mapping, improving speech quality in silent speech interfaces.
Findings
Adversarial training slightly improves PESQ scores.
Adversarial training reduces Mel-Cepstral Distortion.
Consistent enhancement across multiple speech quality metrics.
Abstract
Besides the well-known classification task, these days neural networks are frequently being applied to generate or transform data, such as images and audio signals. In such tasks, the conventional loss functions like the mean squared error (MSE) may not give satisfactory results. To improve the perceptual quality of the generated signals, one possibility is to increase their similarity to real signals, where the similarity is evaluated via a discriminator network. The combination of the generator and discriminator nets is called a Generative Adversarial Network (GAN). Here, we evaluate this adversarial training framework in the articulatory-to-acoustic mapping task, where the goal is to reconstruct the speech signal from a recording of the movement of articulatory organs. As the generator, we apply a 3D convolutional network that gave us good results in an earlier study. To turn it into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
