DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification

Dongheon Lee; and Jung-Woo Choi

arXiv:2409.12413·eess.AS·September 12, 2025·2 cites

DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification

Dongheon Lee, and Jung-Woo Choi

PDF

Open Access

TL;DR

DeFT-Mamba is a novel framework that advances universal multichannel sound separation and polyphonic audio classification by integrating dense frequency-time attention and source counting, achieving superior performance in complex acoustic scenarios.

Contribution

The paper introduces DeFT-Mamba, a new unified framework combining DeFTAN and Mamba for improved sound separation and classification in multichannel polyphonic audio, along with a source counting method and separation refinement.

Findings

01

Outperforms existing networks in complex polyphonic scenarios

02

Effective source counting surpassing threshold-based methods

03

Improved separation quality through refinement tuning

Abstract

This paper presents a framework for universal sound separation and polyphonic audio classification, addressing the challenges of separating and classifying individual sound sources in a multichannel mixture. The proposed framework, DeFT-Mamba, utilizes the dense frequency-time attentive network (DeFTAN) combined with Mamba to extract sound objects, capturing the local time-frequency relations through gated convolution block and the global time-frequency relations through position-wise Hybrid Mamba. DeFT-Mamba surpasses existing separation and classification networks by a large margin, particularly in complex scenarios involving in-class polyphony. Additionally, a classification-based source counting method is introduced to identify the presence of multiple sources, outperforming conventional threshold-based approaches. Separation refinement tuning is also proposed to improve performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsGated Linear Unit · 1x1 Convolution · Gated Convolution · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Convolution