SA-SDR: A novel loss function for separation of meeting style data

Thilo von Neumann; Keisuke Kinoshita; Christoph Boeddeker; Marc; Delcroix; Reinhold Haeb-Umbach

arXiv:2110.15581·eess.AS·April 22, 2022

SA-SDR: A novel loss function for separation of meeting style data

Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc, Delcroix, Reinhold Haeb-Umbach

PDF

Open Access

TL;DR

This paper introduces SA-SDR, a new loss function for source separation that improves stability and robustness in meeting-style data with silent or single-speaker regions, addressing limitations of existing SDR-based losses.

Contribution

The paper proposes a source-aggregated SDR (SA-SDR) loss that enhances robustness against silence and perfect reconstruction issues in neural source separation.

Findings

01

SA-SDR is more stable than traditional SDR modifications.

02

SA-SDR performs better on meeting-style data with silent regions.

03

The proposed loss improves training stability and separation quality.

Abstract

Many state-of-the-art neural network-based source separation systems use the averaged Signal-to-Distortion Ratio (SDR) as a training objective function. The basic SDR is, however, undefined if the network reconstructs the reference signal perfectly or if the reference signal contains silence, e.g., when a two-output separator processes a single-speaker recording. Many modifications to the plain SDR have been proposed that trade-off between making the loss more robust and distorting its value. We propose to switch from a mean over the SDRs of each individual output channel to a global SDR over all output channels at the same time, which we call source-aggregated SDR (SA-SDR). This makes the loss robust against silence and perfect reconstruction as long as at least one reference signal is not silent. We experimentally show that our proposed SA-SDR is more stable and preferable over other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing