Convolutive Transfer Function Invariant SDR training criteria for   Multi-Channel Reverberant Speech Separation

Christoph Boeddeker; Wangyou Zhang; Tomohiro Nakatani; Keisuke; Kinoshita; Tsubasa Ochiai; Marc Delcroix; Naoyuki Kamo; Yanmin Qian; Reinhold; Haeb-Umbach

arXiv:2011.15003·cs.SD·June 9, 2021

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Christoph Boeddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke, Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold, Haeb-Umbach

PDF

1 Repo

TL;DR

This paper introduces a novel training criterion for multi-channel reverberant speech separation using a convolutive transfer function invariant SDR loss, significantly improving performance over traditional methods.

Contribution

It proposes the first use of CI-SDR as a training objective for neural network-based multi-channel reverberant speech separation.

Findings

01

Approaches single-source non-reverberant error rates

02

Outperforms permutation invariant training methods

03

Achieves large margin improvements over alternative objectives

Abstract

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fgnt/ci_sdr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.