DPCCN: Densely-Connected Pyramid Complex Convolutional Network for   Robust Speech Separation And Extraction

Jiangyu Han; Yanhua Long; Lukas Burget; Jan Cernocky

arXiv:2112.13520·eess.AS·February 1, 2022·ICASSP

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

Jiangyu Han, Yanhua Long, Lukas Burget, Jan Cernocky

PDF

1 Repo

TL;DR

This paper introduces DPCCN, a robust time-frequency domain speech separation and extraction network that outperforms existing time-domain methods, especially in cross-domain and noisy environments, by using a densely-connected pyramid structure and a novel speaker encoder.

Contribution

The paper presents a novel densely-connected pyramid complex convolutional network (DPCCN) for robust speech separation and extraction, including a new speaker encoder and a Mixture-Remix adaptation method for cross-domain tasks.

Findings

01

DPCCN outperforms time-domain methods in robustness and accuracy.

02

Mixture-Remix fine-tuning significantly improves cross-domain speech extraction.

03

DPCCN achieves around 3.5 dB SISNR improvement in cross-domain tests.

Abstract

In recent years, a number of time-domain speech separation methods have been proposed. However, most of them are very sensitive to the environments and wide domain coverage tasks. In this paper, from the time-frequency domain perspective, we propose a densely-connected pyramid complex convolutional network, termed DPCCN, to improve the robustness of speech separation under complicated conditions. Furthermore, we generalize the DPCCN to target speech extraction (TSE) by integrating a new specially designed speaker encoder. Moreover, we also investigate the robustness of DPCCN to unsupervised cross-domain TSE tasks. A Mixture-Remix approach is proposed to adapt the target domain acoustic characteristics for fine-tuning the source model. We evaluate the proposed methods not only under noisy and reverberant in-domain condition, but also in clean but cross-domain conditions. Results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jyhan03/icassp22-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.