TL;DR
This paper introduces a joint CNN-LSTM neural network model that accurately infers auditory attention from EEG and speech signals in multi-speaker scenarios, advancing neuro-feedback applications in hearing aids.
Contribution
The study presents a novel deep learning approach combining CNN and LSTM to classify auditory attention from EEG and speech spectrograms, outperforming traditional linear methods.
Findings
Median decoding accuracy of 77.2% at 3 seconds
Model tolerates up to 50% sparsity without accuracy loss
Effective across three different languages and datasets
Abstract
Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multi-speaker scenario. It has been recently shown that we can quantitatively evaluate the segregation capability by modelling the relationship between the speech signals present in an auditory scene and the cortical signals of the listener measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids whereby the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where the speech cues such as envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN) - long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
