A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Tyler Benster; Guy Wilson; Reshef Elisha; Francis R Willett; Shaul; Druckmann

arXiv:2403.05583·cs.HC·March 12, 2024·2 cites

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Tyler Benster, Guy Wilson, Reshef Elisha, Francis R Willett, Shaul, Druckmann

PDF

Open Access 1 Repo

TL;DR

This paper presents MONA, a cross-modal silent speech recognition system enhanced by LLM scoring, achieving significant reductions in word error rate and demonstrating the viability of noninvasive silent speech interfaces as alternatives to traditional ASR.

Contribution

Introduces MONA with novel loss functions and LISA, enabling silent speech recognition on open vocabulary with state-of-the-art accuracy improvements.

Findings

01

Reduced silent speech WER from 28.8% to 12.2% on benchmark datasets.

02

Achieved 3.7% WER on vocal EMG recordings, surpassing previous state-of-the-art.

03

Performed best in Brain-to-Text 2024 competition, with top WER of 8.9%.

Abstract

Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions--cross-contrast (crossCon) and supervised temporal contrast (supTcon)--to train a multimodal model with a shared latent representation. This architecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recognition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Adjustment (LISA) significantly improves recognition accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tbenst/silent_speech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing