Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Brian A. Yuan; Joseph G. Makin

arXiv:2405.18639·q-bio.NC·May 30, 2024·1 cites

Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Brian A. Yuan, Joseph G. Makin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised pretraining approach using wav2vec for ECoG data, significantly improving speech decoding accuracy from neural recordings with less labeled data and cross-patient transfer.

Contribution

It adapts wav2vec for ECoG signals, enabling effective self-supervised learning and transfer learning to enhance speech decoding from brain recordings.

Findings

01

Wav2vec representations outperform original ECoG features in decoding accuracy.

02

Pretraining on other patients' data further improves performance.

03

Word error rates decrease by over 50% in best cases.

Abstract

Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly burdensome for invasive neural recordings acquired from human patients. On the other hand, these patients typically produce speech outside of the experimental blocks used for training decoders. Making use of such data, and data from other patients, to improve decoding would ease the burden of data collection -- especially onerous for dys- and anarthric patients. Here we demonstrate that this is possible, by reengineering wav2vec -- a simple, self-supervised, fully convolutional model that learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

b4yuan/ecog2vec
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis