Sampling Frequency Independent Dialogue Separation

Jouni Paulus; Matteo Torcoli

arXiv:2206.02124·eess.AS·June 7, 2022

Sampling Frequency Independent Dialogue Separation

Jouni Paulus, Matteo Torcoli

PDF

Open Access

TL;DR

This paper demonstrates that DNN models for dialogue separation trained at 8 kHz can be effectively transferred to 48 kHz without perceptual loss, enabling faster training and flexible data usage.

Contribution

It shows that model parameters for dialogue separation are sampling frequency independent, allowing transferability across different audio sampling rates.

Findings

01

No significant perceptual difference between models trained at 8 kHz and 48 kHz.

02

Transferability reduces training time and computational costs.

03

Enables using lower sampling frequency datasets for high-frequency applications.

Abstract

In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audio at 48 kHz. The separated audio sources are compared with the ones produced by the same model architectures trained with 48 kHz versions of the same training data. A listening test and computational measures show that there is no significant perceptual difference between the models trained with 8 kHz or with 48 kHz. This transferability of the learned parameters allows for a faster and computationally less costly training. It also enables using training datasets available at a lower sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net