Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation

Kohei Saijo; Yoshiaki Bando

arXiv:2602.08671·eess.AS·February 10, 2026

Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation

Kohei Saijo, Yoshiaki Bando

PDF

Open Access

TL;DR

This paper introduces Spectral Feature Compression (SFC), a novel, input-adaptive, and parameter-efficient method for frequency information compression in source separation, outperforming traditional band-split modules.

Contribution

The paper proposes SFC, a new sequence modeling approach that overcomes limitations of the band-split module by being input-adaptive and reducing parameters, with variants based on cross-attention and Mamba.

Findings

01

SFC outperforms the band-split module in MSS and CASS tasks.

02

SFC adaptively captures frequency patterns from input data.

03

SFC maintains performance across different separator sizes and compression ratios.

Abstract

Time-frequency domain dual-path models have demonstrated strong performance and are widely used in source separation. Because their computational cost grows with the number of frequency bins, these models often use the band-split (BS) module in high-sampling-rate tasks such as music source separation (MSS) and cinematic audio source separation (CASS). The BS encoder compresses frequency information by encoding features for each predefined subband. It achieves effective compression by introducing an inductive bias that places greater emphasis on low-frequency parts. Despite its success, the BS module has two inherent limitations: (i) it is not input-adaptive, preventing the use of input-dependent information, and (ii) the parameter count is large, since each subband requires a dedicated module. To address these issues, we propose Spectral Feature Compression (SFC). SFC compresses the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis