DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
Dongheon Lee, and Jung-Woo Choi

TL;DR
DeFT-Mamba is a novel framework that advances universal multichannel sound separation and polyphonic audio classification by integrating dense frequency-time attention and source counting, achieving superior performance in complex acoustic scenarios.
Contribution
The paper introduces DeFT-Mamba, a new unified framework combining DeFTAN and Mamba for improved sound separation and classification in multichannel polyphonic audio, along with a source counting method and separation refinement.
Findings
Outperforms existing networks in complex polyphonic scenarios
Effective source counting surpassing threshold-based methods
Improved separation quality through refinement tuning
Abstract
This paper presents a framework for universal sound separation and polyphonic audio classification, addressing the challenges of separating and classifying individual sound sources in a multichannel mixture. The proposed framework, DeFT-Mamba, utilizes the dense frequency-time attentive network (DeFTAN) combined with Mamba to extract sound objects, capturing the local time-frequency relations through gated convolution block and the global time-frequency relations through position-wise Hybrid Mamba. DeFT-Mamba surpasses existing separation and classification networks by a large margin, particularly in complex scenarios involving in-class polyphony. Additionally, a classification-based source counting method is introduced to identify the presence of multiple sources, outperforming conventional threshold-based approaches. Separation refinement tuning is also proposed to improve performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsGated Linear Unit · 1x1 Convolution · Gated Convolution · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Convolution
