Improved Decoding of Attentional Selection in Multi-Talker Environments   with Self-Supervised Learned Speech Representation

Cong Han; Vishal Choudhari; Yinghao Aaron Li; Nima Mesgarani

arXiv:2302.05756·eess.AS·February 14, 2023·1 cites

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

PDF

Open Access

TL;DR

This study demonstrates that self-supervised learned speech representations significantly improve the accuracy and speed of auditory attention decoding in multi-talker environments, advancing brain-controlled hearing technologies.

Contribution

The paper introduces the use of WavLM's self-supervised speech representations to enhance auditory attention decoding accuracy and efficiency over traditional waveform and spectrogram methods.

Findings

01

WavLM representations outperform traditional methods in decoding accuracy.

02

Decoding speed is improved using self-supervised speech features.

03

Results suggest potential for brain-controlled hearing aids.

Abstract

Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Blind Source Separation Techniques · Hearing Loss and Rehabilitation

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings