NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

Dashanka De Silva; Siqi Cai; Saurav Pahuja; Tanja Schultz; Haizhou Li

arXiv:2409.02489·cs.SD·September 17, 2024

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention

Dashanka De Silva, Siqi Cai, Saurav Pahuja, Tanja Schultz, Haizhou Li

PDF

Open Access

TL;DR

NeuroSpex is a novel model that leverages EEG signals and cross-modal attention to improve speaker extraction in noisy environments, outperforming baseline methods.

Contribution

It introduces a neuro-guided speaker extraction framework using EEG as the sole auxiliary cue and a new cross-attention mechanism for enhanced speech separation.

Findings

01

Outperforms baseline models on a public dataset

02

Effective use of EEG signals for speaker attention

03

Improved speech extraction accuracy

Abstract

In the study of auditory attention, it has been revealed that there exists a robust correlation between attended speech and elicited neural responses, measurable through electroencephalography (EEG). Therefore, it is possible to use the attention information available within EEG signals to guide the extraction of the target speaker in a cocktail party computationally. In this paper, we present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue to extract attended speech from monaural speech mixtures. We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations, generating a speaker extraction mask. Experimental results on a publicly available dataset demonstrate that our proposed model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSoftmax · Attention Is All You Need