Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach
Ran Wang, Yao Wang, Adeen Flinker

TL;DR
This study demonstrates that a WaveNet model can reconstruct speech stimuli from intracranial recordings of the human auditory cortex, revealing phoneme-level tuning properties and advancing understanding of speech processing.
Contribution
The paper introduces a WaveNet-based approach capable of reconstructing speech from limited intracranial data and analyzes cortical phonetic tuning properties.
Findings
WaveNet successfully reconstructs speech from STG recordings.
Electrode-specific phoneme-level tuning observed.
Supports role of pSTG in phonetic speech representation.
Abstract
The superior temporal gyrus (STG) region of cortex critically contributes to speech recognition. In this work, we show that a proposed WaveNet, with limited available data, is able to reconstruct speech stimuli from STG intracranial recordings. We further investigate the impulse response of the fitted model for each recording electrode and observe phoneme level temporospectral tuning properties for the recorded area of cortex. This discovery is consistent with previous studies implicating the posterior STG (pSTG) in a phonetic representation of speech and provides detailed acoustic features that certain electrode sites possibly extract during speech recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Neural Networks and Applications · Blind Source Separation Techniques
MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet
