Enhancing Listened Speech Decoding from EEG via Parallel Phoneme   Sequence Prediction

Jihwan Lee; Tiantian Feng; Aditya Kommineni; Sudarsana Reddy Kadiri,; Shrikanth Narayanan

arXiv:2501.04844·eess.AS·January 10, 2025

Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction

Jihwan Lee, Tiantian Feng, Aditya Kommineni, Sudarsana Reddy Kadiri,, Shrikanth Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel EEG-based speech decoding method that simultaneously predicts speech waveforms and phoneme sequences, improving accuracy and efficiency for brain-computer interfaces aiding speech-impaired individuals.

Contribution

The paper presents a new multi-task model architecture that jointly decodes speech waveforms and phoneme sequences from EEG signals, outperforming previous methods.

Findings

01

Outperforms previous EEG speech decoding methods.

02

Provides simultaneous decoding of speech and phonemes.

03

Enables real-time, multi-modal speech reconstruction from EEG.

Abstract

Brain-computer interfaces (BCI) offer numerous human-centered application possibilities, particularly affecting people with neurological disorders. Text or speech decoding from brain activities is a relevant domain that could augment the quality of life for people with impaired speech perception. We propose a novel approach to enhance listened speech decoding from electroencephalography (EEG) signals by utilizing an auxiliary phoneme predictor that simultaneously decodes textual phoneme sequences. The proposed model architecture consists of three main parts: EEG module, speech module, and phoneme predictor. The EEG module learns to properly represent EEG signals into EEG embeddings. The speech module generates speech waveforms from the EEG embeddings. The phoneme predictor outputs the decoded phoneme sequences in text modality. Our proposed approach allows users to obtain decoded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lee-jhwn/icassp25-fesde-phoneme
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · EEG and Brain-Computer Interfaces · Blind Source Separation Techniques