Improved Feature Extraction Network for Neuro-Oriented Target Speaker   Extraction

Cunhang Fan; Youdian Gao; Zexu Pan; Jingjing Zhang; Hongyu Zhang; Jie; Zhang; Zhao Lv

arXiv:2501.01673·cs.SD·January 6, 2025

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

Cunhang Fan, Youdian Gao, Zexu Pan, Jingjing Zhang, Hongyu Zhang, Jie, Zhang, Zhao Lv

PDF

Open Access

TL;DR

This paper introduces IFENet, a novel neural network architecture that enhances target speaker extraction from EEG signals by modeling speech and EEG features with dual-path Mamba and Kolmogorov-Arnold Networks, respectively.

Contribution

The paper presents a new feature extraction network combining dual-path Mamba and KAN for improved neuro-oriented target speaker extraction.

Findings

01

Achieved 36% and 29% relative improvements in SI-SDR on KUL and AVED datasets.

02

Outperformed state-of-the-art models in target speaker extraction accuracy.

03

Effectively models long speech sequences and EEG features for better speaker localization.

Abstract

The recent rapid development of auditory attention decoding (AAD) offers the possibility of using electroencephalography (EEG) as auxiliary information for target speaker extraction. However, effectively modeling long sequences of speech and resolving the identity of the target speaker from EEG signals remains a major challenge. In this paper, an improved feature extraction network (IFENet) is proposed for neuro-oriented target speaker extraction, which mainly consists of a speech encoder with dual-path Mamba and an EEG encoder with Kolmogorov-Arnold Networks (KAN). We propose SpeechBiMamba, which makes use of dual-path Mamba in modeling local and global speech sequences to extract speech features. In addition, we propose EEGKAN to effectively extract EEG features that are closely related to the auditory stimuli and locate the target speaker through the subject's attention information.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing