Digital Voicing of Silent Speech

David Gaddy; Dan Klein

arXiv:2010.02960·eess.AS·October 8, 2020

Digital Voicing of Silent Speech

David Gaddy, Dan Klein

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel approach to convert silent speech into audible speech using EMG signals collected during silent articulation, significantly improving intelligibility over previous methods.

Contribution

First to train speech synthesis models from EMG during silent speech, using transfer learning from vocalized speech, and providing a new dataset for the community.

Findings

01

Word error rate reduced from 64% to 4% with silent EMG training.

02

Demonstrated significant improvement over baseline in speech intelligibility.

03

Shared a new dataset of silent and vocalized EMG measurements.

Abstract

In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has focused on training speech synthesis models from EMG collected during vocalized speech, we are the first to train from EMG collected during silently articulated speech. We introduce a method of training on silent EMG by transferring audio targets from vocalized to silent signals. Our method greatly improves intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data, decreasing transcription word error rate from 64% to 4% in one data condition and 88% to 68% in another. To spur further development on this task, we share our new dataset of silent and vocalized facial EMG measurements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dgaddy/silent_speech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Speech and dialogue systems