Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction
Mohammed Salah Al-Radhi, G\'eza N\'emeth, Andon Tchechmedjiev, Binbin Xu

TL;DR
This paper introduces a novel brain-to-speech system using prosody-aware features and a transformer architecture to improve speech reconstruction from intracranial EEG signals.
Contribution
It presents a new pipeline for extracting prosodic features from iEEG data and a transformer model that enhances speech naturalness and intelligibility.
Findings
Outperforms traditional Griffin-Lim and CNN-based methods
Improves speech intelligibility and expressiveness
Demonstrates superior quantitative and perceptual metrics
Abstract
This chapter presents a novel approach to brain-to-speech (BTS) synthesis from intracranial electroencephalography (iEEG) data, emphasizing prosody-aware feature engineering and advanced transformer-based models for high-fidelity speech reconstruction. Driven by the increasing interest in decoding speech directly from brain activity, this work integrates neuroscience, artificial intelligence, and signal processing to generate accurate and natural speech. We introduce a novel pipeline for extracting key prosodic features directly from complex brain iEEG signals, including intonation, pitch, and rhythm. To effectively utilize these crucial features for natural-sounding speech, we employ advanced deep learning models. Furthermore, this chapter introduces a novel transformer encoder architecture specifically designed for brain-to-speech tasks. Unlike conventional models, our architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
