TL;DR
This paper introduces AAD-LLM, a neural attention-driven auditory model that uses brain signals to identify listener focus in complex sound scenes, improving response relevance and perception alignment.
Contribution
It presents a novel system integrating neural signals with auditory LLMs to infer listener attention and enhance auditory scene understanding.
Findings
Improved alignment with listener intention in multitalker scenarios
Enhanced accuracy in speaker identification from neural data
Better performance in speech transcription and question answering
Abstract
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsFocus
