EPG2S: Speech Generation and Speech Enhancement based on   Electropalatography and Audio Signals using Multimodal Learning

Li-Chin Chen; Po-Hsun Chen; Richard Tzong-Han Tsai; and Yu Tsao

arXiv:2206.07860·cs.SD·November 29, 2023

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

Li-Chin Chen, Po-Hsun Chen, Richard Tzong-Han Tsai, and Yu Tsao

PDF

Open Access

TL;DR

This paper introduces EPG2S, a multimodal system combining electropalatography and audio signals to improve speech generation and enhancement, especially for patients with speech impairments, demonstrating promising results with various fusion strategies.

Contribution

The study presents a novel multimodal EPG-to-speech system that effectively integrates EPG and audio signals for speech generation and enhancement, exploring multiple fusion strategies.

Findings

01

EPG2S achieves high-quality speech generation from EPG signals alone.

02

Adding noisy speech signals improves speech quality and intelligibility.

03

Late fusion strategy is most effective for combined speech generation and enhancement.

Abstract

Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders