Du-IN: Discrete units-guided mask modeling for decoding speech from   Intracranial Neural signals

Hui Zheng; Hai-Teng Wang; Wei-Bang Jiang; Zhong-Tao Chen; Li He,; Pei-Yang Lin; Peng-Hu Wei; Guo-Guang Zhao; Yun-Zhe Liu

arXiv:2405.11459·eess.SP·November 4, 2024

Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He,, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Du-IN, a novel brain decoding model that leverages region-level tokens and mask modeling to improve speech decoding from intracranial neural signals, achieving state-of-the-art results.

Contribution

The paper presents a new region-level token-based model with discrete codex-guided mask modeling for speech decoding from sEEG data, outperforming existing methods.

Findings

01

Achieved state-of-the-art 61-word classification accuracy.

02

Region-level temporal modeling with 1D depthwise convolution improves performance.

03

Self-supervised mask modeling significantly enhances speech decoding accuracy.

Abstract

Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications, but less damaging methods like intracranial stereo-electroencephalography (sEEG) remain underexplored. With rapid advances in representation learning, leveraging abundant recordings to enhance speech decoding is increasingly attractive. However, popular methods often pre-train temporal models based on brain-level tokens, overlooking that brain activities in different regions are highly desynchronized during tasks. Alternatively, they pre-train spatial-temporal models based on channel-level tokens but fail to evaluate them on challenging tasks like speech decoding, which requires intricate processing in specific language-related areas. To address this issue, we collected a well-annotated Chinese word-reading sEEG dataset targeting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liulab-repository/du-in
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Robotics and Automated Systems

MethodsAttention Is All You Need · Depthwise Convolution · Convolution · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding