Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto,, Kazuhiro Kobayashi, Tomoki Toda

TL;DR
This paper introduces a robust linguistic encoder framework for electrolaryngeal speech enhancement, reducing speech and speaker mismatches, leading to more intelligible and natural speech synthesis.
Contribution
The paper proposes a novel linguistic encoder that projects both electrolaryngeal and typical speech into a unified latent space, and incorporates HuBERT features for speaker mismatch reduction.
Findings
16% improvement in character error rate
Significant enhancement in speech naturalness
Effective reduction of speech and speaker mismatches
Abstract
We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
