Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention

Gene-Ping Yang; Sebastian Braun

arXiv:2507.16104·eess.AS·July 23, 2025·WASPAA

Distributed Asynchronous Device Speech Enhancement via Windowed Cross-Attention

Gene-Ping Yang, Sebastian Braun

PDF

Open Access

TL;DR

This paper introduces a windowed cross-attention module for neural multi-microphone processing that effectively handles asynchronous microphones with varying latency and clock drift, improving speech enhancement in dynamic environments.

Contribution

It presents a novel windowed cross-attention mechanism that aligns features across asynchronous microphones, enhancing existing models for real-world scenarios.

Findings

01

Outperforms TAC in noisy reverberant environments

02

Faster convergence and better learning in experiments

03

Effective in multi-talker and asynchronous setups

Abstract

The increasing number of microphone-equipped personal devices offers great flexibility and potential using them as ad-hoc microphone arrays in dynamic meeting environments. However, most existing approaches are designed for time-synchronized microphone setups, a condition that may not hold in real-world meeting scenarios, where time latency and clock drift vary across devices. Under such conditions, we found transform-average-concatenate (TAC), a popular module for neural multi-microphone processing, insufficient in handling time-asynchronous microphones. In response, we propose a windowed cross-attention module capable of dynamically aligning features between all microphones. This module is invariant to both the permutation and the number of microphones and can be easily integrated into existing models. Furthermore, we propose an optimal training target for multi-talker environments.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Indoor and Outdoor Localization Technologies