Improving Speech Decoding from ECoG with Self-Supervised Pretraining
Brian A. Yuan, Joseph G. Makin

TL;DR
This paper introduces a self-supervised pretraining approach using wav2vec for ECoG data, significantly improving speech decoding accuracy from neural recordings with less labeled data and cross-patient transfer.
Contribution
It adapts wav2vec for ECoG signals, enabling effective self-supervised learning and transfer learning to enhance speech decoding from brain recordings.
Findings
Wav2vec representations outperform original ECoG features in decoding accuracy.
Pretraining on other patients' data further improves performance.
Word error rates decrease by over 50% in best cases.
Abstract
Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly burdensome for invasive neural recordings acquired from human patients. On the other hand, these patients typically produce speech outside of the experimental blocks used for training decoders. Making use of such data, and data from other patients, to improve decoding would ease the burden of data collection -- especially onerous for dys- and anarthric patients. Here we demonstrate that this is possible, by reengineering wav2vec -- a simple, self-supervised, fully convolutional model that learns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
