StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture
Zelin Qiu, Dingding Yao, Junfeng Li

TL;DR
StreamAAD introduces a streaming architecture for spatial auditory attention decoding that models inter-window relationships, significantly improving performance in the Chinese AAD Challenge.
Contribution
The paper proposes a novel streaming decoding architecture, StreamAAD, that considers relationships between decision windows for improved spatial auditory attention decoding.
Findings
Achieved first place in the Chinese AAD Challenge.
Significantly outperformed baseline methods.
Demonstrated the effectiveness of inter-window relationship modeling.
Abstract
In this paper, we present our approach for the Track 1 of the Chinese Auditory Attention Decoding (Chinese AAD) Challenge at ISCSLP 2024. Most existing spatial auditory attention decoding (Sp-AAD) methods employ an isolated window architecture, focusing solely on global invariant features without considering relationships between different decision windows, which can lead to suboptimal performance. To address this issue, we propose a novel streaming decoding architecture, termed StreamAAD. In StreamAAD, decision windows are input to the network as a sequential stream and decoded in order, allowing for the modeling of inter-window relationships. Additionally, we employ a model ensemble strategy, achieving significant better performance than the baseline, ranking First in the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Tactile and Sensory Interactions · Multisensory perception and integration
