Diff-ETS: Learning a Diffusion Probabilistic Model for   Electromyography-to-Speech Conversion

Zhao Ren; Kevin Scheck; Qinhan Hou; Stefano van Gogh; Michael Wand,; Tanja Schultz

arXiv:2405.08021·cs.SD·May 15, 2024

Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand,, Tanja Schultz

PDF

Open Access

TL;DR

Diff-ETS introduces a diffusion probabilistic model to enhance naturalness in electromyography-to-speech conversion, significantly improving speech quality over baseline models by refining acoustic features.

Contribution

This work is the first to apply a score-based diffusion model to electromyography-to-speech conversion for improved speech naturalness.

Findings

01

Diff-ETS outperforms baseline models in naturalness metrics.

02

Diffusion model improves quality of predicted acoustic features.

03

End-to-end training enhances speech synthesis results.

Abstract

Electromyography-to-Speech (ETS) conversion has demonstrated its potential for silent speech interfaces by generating audible speech from Electromyography (EMG) signals during silent articulations. ETS models usually consist of an EMG encoder which converts EMG signals to acoustic speech features, and a vocoder which then synthesises the speech signals. Due to an inadequate amount of available data and noisy signals, the synthesised speech often exhibits a low level of naturalness. In this work, we propose Diff-ETS, an ETS model which uses a score-based diffusion probabilistic model to enhance the naturalness of synthesised speech. The diffusion model is applied to improve the quality of the acoustic features predicted by an EMG encoder. In our experiments, we evaluated fine-tuning the diffusion model on predictions of a pre-trained EMG encoder, and training both models in an end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research