Refining Self-Supervised Learnt Speech Representation using Brain Activations
Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Liping Chen, and Jie Zhang, Zhenhua Ling

TL;DR
This paper proposes using brain activation data from fMRI to refine self-supervised speech models like wav2vec2.0, improving their performance on various speech-related tasks by aligning model representations with human neural responses.
Contribution
It introduces a novel method to enhance self-supervised speech models by leveraging brain activation data for alignment, which was not previously explored.
Findings
Improved performance on downstream tasks such as speaker verification and speech recognition.
Alignment with brain activations benefits multiple speech processing tasks.
Proposed method offers a new way to optimize self-supervised speech models.
Abstract
It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work, we therefore propose to use the brain activations recorded by fMRI to refine the often-used wav2vec2.0 model by aligning model representations toward human neural responses. Experimental results on SUPERB reveal that this operation is beneficial for several downstream tasks, e.g., speaker verification, automatic speech recognition, intent classification.One can then consider the proposed method as a new alternative to improve self-supervised speech models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning · EEG and Brain-Computer Interfaces
