CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays

Chengqian Jiang; Jie Zhang; Haoyin Yan

arXiv:2601.17711·cs.SD·January 27, 2026

CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays

Chengqian Jiang, Jie Zhang, Haoyin Yan

PDF

Open Access

TL;DR

CaSNet introduces a resource-efficient speech enhancement method for distributed microphone arrays that compresses data at each device using SVD, reducing bandwidth while maintaining high speech quality.

Contribution

The paper proposes a novel Compress-and-Send Network (CaSNet) that enables efficient data transmission and effective speech enhancement in distributed microphone arrays.

Findings

01

Significant data reduction with minimal performance loss.

02

Effective alignment and neural decoding for spatially coherent speech.

03

Comparable results to uncompressed methods on multiple datasets.

Abstract

Distributed microphone array (DMA) is a promising next-generation platform for speech interaction, where speech enhancement (SE) is still required to improve the speech quality in noisy cases. Existing SE methods usually first gather raw waveforms at a fusion center (FC) from all devices and then design a multi-microphone model, causing high bandwidth and energy costs. In this work, we propose a \emph{Compress-and-Send Network (CaSNet)} for resource-constrained DMAs, where one microphone serves as the FC and reference. Each of other devices encodes the measured raw data into a feature matrix, which is then compressed by singular value decomposition (SVD) to produce a more compact representation. The received features at the FC are aligned via cross window query with respect to the reference, followed by neural decoding to yield spatially coherent enhanced speech. Experiments on multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques