Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
Huiyan Li, Haohong Lin, You Wang, Hengyang Wang, Ming Zhang, Han Gao,, Qing Ai, Zhiyuan Luo, and Guang Li

TL;DR
This paper introduces a Seq2Seq model that synthesizes Mandarin Chinese speech from sEMG signals for silent speech decoding, achieving low error rates and demonstrating effectiveness in tonal language contexts.
Contribution
The study presents an optimized Seq2Seq approach with duration regulation and a state-of-the-art vocoder for silent speech reconstruction in tonal languages, addressing previous challenges.
Findings
Achieved an average CER of 6.41% in Mandarin silent speech decoding.
Successfully decoded silent speech in Mandarin Chinese with human evaluation.
Demonstrated effectiveness of the model across six speakers.
Abstract
Silent Speech Decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of Brain-Computer Interface (BCI) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized Sequence-to-Sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Then, we provide a deep-learning model with an encoder-decoder structure and a state-of-art vocoder to generate the audio waveform. Experiments based on six Mandarin Chinese speakers demonstrate that the proposed model can successfully decode silent speech in Mandarin Chinese and achieve a character…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Speech Recognition and Synthesis · Speech and Audio Processing
