# WHAM!: Extending Speech Separation to Noisy Environments

**Authors:** Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu,, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux

arXiv: 1907.01160 · 2019-07-03

## TL;DR

This paper introduces the WHAM! dataset with real ambient noise for more realistic speech separation evaluation and benchmarks various models, showing robustness improvements despite noise interference.

## Contribution

The paper presents a new noisy speech separation dataset, WHAM!, and evaluates existing models' robustness to ambient noise in realistic environments.

## Key findings

- Separation performance decreases with noise but remains significantly better than noisy signals.
- Benchmark results highlight the robustness of certain architectures in noisy conditions.
- WHAM! dataset enables more realistic evaluation of speech separation methods.

## Abstract

Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01160/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01160/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1907.01160/full.md

---
Source: https://tomesphere.com/paper/1907.01160