AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Xilin Jiang; Sukru Samet Dindar; Vishal Choudhari; Stephan Bickel; Ashesh Mehta; Guy M McKhann; Daniel Friedman; Adeen Flinker; Nima Mesgarani

arXiv:2502.16794·cs.SD·June 12, 2025

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Daniel Friedman, Adeen Flinker, Nima Mesgarani

PDF

1 Video

TL;DR

This paper introduces AAD-LLM, a neural attention-driven auditory model that uses brain signals to identify listener focus in complex sound scenes, improving response relevance and perception alignment.

Contribution

It presents a novel system integrating neural signals with auditory LLMs to infer listener attention and enhance auditory scene understanding.

Findings

01

Improved alignment with listener intention in multitalker scenarios

02

Enhanced accuracy in speaker identification from neural data

03

Better performance in speech transcription and question answering

Abstract

Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding· underline

Taxonomy

MethodsFocus