# Multichannel Loss Function for Supervised Speech Source Separation by   Mask-based Beamforming

**Authors:** Yoshiki Masuyama, Masahito Togami, and Tatsuya Komatsu

arXiv: 1907.04984 · 2019-07-12

## TL;DR

This paper introduces multichannel loss functions for training neural networks to improve mask-based beamforming in speech source separation, leading to more effective and robust separation performance across microphone setups.

## Contribution

The paper proposes novel multichannel loss functions based on the multichannel Itakura--Saito divergence for training DNNs in mask-based beamforming, enhancing separation quality.

## Key findings

- Effective in improving speech separation performance.
- Robust to different microphone configurations.
- Outperforms traditional loss functions in experiments.

## Abstract

In this paper, we propose two mask-based beamforming methods using a deep neural network (DNN) trained by multichannel loss functions. Beamforming technique using time-frequency (TF)-masks estimated by a DNN have been applied to many applications where TF-masks are used for estimating spatial covariance matrices. To train a DNN for mask-based beamforming, loss functions designed for monaural speech enhancement/separation have been employed. Although such a training criterion is simple, it does not directly correspond to the performance of mask-based beamforming. To overcome this problem, we use multichannel loss functions which evaluate the estimated spatial covariance matrices based on the multichannel Itakura--Saito divergence. DNNs trained by the multichannel loss functions can be applied to construct several beamformers. Experimental results confirmed their effectiveness and robustness to microphone configurations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.04984/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1907.04984/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1907.04984/full.md

---
Source: https://tomesphere.com/paper/1907.04984