Uncovering the role of semantic and acoustic cues in normal and dichotic listening

Sai Samrat Kankanala; Akshara Soman; Sriram Ganapathy

arXiv:2411.11308·eess.AS·July 31, 2025

Uncovering the role of semantic and acoustic cues in normal and dichotic listening

Sai Samrat Kankanala, Akshara Soman, Sriram Ganapathy

PDF

Open Access

TL;DR

This study investigates how acoustic and semantic cues contribute to speech comprehension in normal and dichotic listening, using EEG data and a novel deep learning model to quantify their roles.

Contribution

The paper introduces a multimodal deep learning model and a match-mismatch task to quantify acoustic and semantic cue roles in speech perception under different listening conditions.

Findings

01

Semantic cues improve MM classification in dichotic listening.

02

Speech perception is fragmented at word boundaries.

03

The proposed model outperforms previous MM models.

Abstract

Speech comprehension is an involuntary task for the healthy human brain, yet the understanding of the mechanisms underlying this brain functionality remains obscure. In this paper, we aim to quantify the role of acoustic and semantic information streams in complex listening conditions. We propose a paradigm to understand the encoding of the speech cues in electroencephalogram (EEG) data, by designing a match-mismatch (MM) classification task. The MM task involves identifying whether the stimulus (speech) and response (EEG) correspond to each other. We build a multimodal deep-learning based sequence model STEM, which is input with acoustic stimulus (speech envelope), semantic stimulus (textual representations of speech), and the neural response (EEG data). We perform extensive experiments on two separate conditions, i) natural passive listening and, ii) a dichotic listening requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing

MethodsSoftmax · Attention Is All You Need