Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch

Kanami Imamura; Tomohiko Nakamura; Kohei Yatabe; Hiroshi Saruwatari

arXiv:2601.14684·cs.SD·January 22, 2026

Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch

Kanami Imamura, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari

PDF

Open Access

TL;DR

This paper investigates why audio source separation performance degrades when using sampling frequency mismatch and proposes novel resampling methods that improve robustness across models.

Contribution

It introduces noisy-kernel and trainable-kernel resampling techniques that mitigate performance loss due to sampling frequency mismatch in neural network-based audio processing.

Findings

01

Noisy-kernel resampling improves separation quality.

02

Trainable-kernel adapts to different models effectively.

03

Proposed methods outperform conventional resampling.

Abstract

Audio processing methods based on deep neural networks are typically trained at a single sampling frequency (SF). To handle untrained SFs, signal resampling is commonly employed, but it can degrade performance, particularly when the input SF is lower than the trained SF. This paper investigates the causes of this degradation through two hypotheses: (i) the lack of high-frequency components introduced by up-sampling, and (ii) the greater importance of their presence than their precise representation. To examine these hypotheses, we compare conventional resampling with three alternatives: post-resampling noise addition, which adds Gaussian noise to the resampled signal; noisy-kernel resampling, which perturbs the kernel with Gaussian noise to enrich high-frequency components; and trainable-kernel resampling, which adapts the interpolation kernel through training. Experiments on music…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Music and Audio Processing