Sampling-Frequency-Independent Audio Source Separation Using Convolution   Layer Based on Impulse Invariant Method

Koichi Saito; Tomohiko Nakamura; Kohei Yatabe; Yuma Koizumi; Hiroshi; Saruwatari

arXiv:2105.04079·cs.SD·May 11, 2021·1 cites

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi, Saruwatari

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convolution layer based on the impulse invariant method that allows a single deep neural network to perform audio source separation across arbitrary sampling frequencies, including unseen ones.

Contribution

The proposed convolution layer enables DNN-based audio source separation models to operate effectively across varying and unseen sampling frequencies.

Findings

01

Model works with unseen sampling frequencies.

02

Enables a single model to handle multiple sampling rates.

03

Improves versatility of audio source separation models.

Abstract

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TomohikoNakamura/sfi_convtasnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution