NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating
Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

TL;DR
NAACA is a training-free neuro-inspired architecture that enhances salience-driven attention in audio processing by using oscillatory working memory to selectively trigger higher-level reasoning, improving performance on complex audio tasks.
Contribution
It introduces a novel training-free neuro-inspired architecture with oscillatory working memory for salience-driven attention in audio models, addressing attention bottlenecks.
Findings
NAACA improves AudioQwen's average precision from 53.50% to 70.60%.
OWM captures novel events and remains robust to noise.
Reduces unnecessary ALM invocations.
Abstract
Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is OWM, a neuro-inspired Oscillatory Working Memory that maintains stable attractor-like states and triggers higher-cognition ALM processing only when adaptive energy fluctuations signal perceptual salience, triggering higher-level reasoning. On XD-Violence, NAACA improves AudioQwen's average precision (AP) from 53.50% to 70.60% while reducing unnecessary ALM invocations. Furthermore, qualitative case studies on the Urban Soundscapes of the World (USoW) dataset show that OWM captures novel events and subcategory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
