Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Amin Honarmandi Shandiz; L\'aszl\'o T\'oth; G\'abor Gosztolya,; Alexandra Mark\'o; Tam\'as G\'abor Csap\'o

arXiv:2106.04552·cs.SD·June 14, 2021

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Amin Honarmandi Shandiz, L\'aszl\'o T\'oth, G\'abor Gosztolya,, Alexandra Mark\'o, Tam\'as G\'abor Csap\'o

PDF

TL;DR

This paper introduces multi-speaker ultrasound-based speaker embeddings using an adapted x-vector framework, demonstrating low recognition error rates and potential for improving silent speech interface accuracy across speakers.

Contribution

It presents the first multi-speaker ultrasound speaker embeddings with effective speaker recognition and explores their application in multi-speaker silent speech synthesis.

Findings

01

Speaker recognition error rates below 3%

02

Embeddings generalize well to unseen speakers

03

Marginal error rate reduction in ultrasound-to-speech conversion

Abstract

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly specific to the actual speaker. Hence, due to the lack of multi-speaker data sets, researchers have so far concentrated on speaker-dependent modeling. Here, we present multi-speaker experiments using the recently published TaL80 corpus. To model speaker characteristics, we adjusted the x-vector framework popular in speech processing to operate with ultrasound tongue videos. Next, we performed speaker recognition experiments using 50 speakers from the corpus. Then, we created speaker embedding vectors and evaluated them on the remaining speakers. Finally, we examined how the embedding vector influences the accuracy of our ultrasound-to-speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.