EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech   Enhancement and Dereverberation

Julius Richter; Yi-Chiao Wu; Steven Krenn; Simon Welker; Bunlong Lay,; Shinji Watanabe; Alexander Richard; Timo Gerkmann

arXiv:2406.06185·eess.AS·June 13, 2024

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay,, Shinji Watanabe, Alexander Richard, Timo Gerkmann

PDF

Open Access 2 Repos 1 Models 1 Datasets

TL;DR

The paper introduces the EARS dataset, a comprehensive high-quality anechoic speech dataset with diverse speaking styles, and benchmarks various speech enhancement and dereverberation methods using instrumental metrics and listening tests.

Contribution

It provides a new large-scale, diverse speech dataset and establishes benchmark evaluations for speech enhancement and dereverberation methods.

Findings

01

Generative methods are preferred in listening tests.

02

Benchmarking reveals strengths and weaknesses of current methods.

03

The dataset enables automatic online evaluation.

Abstract

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
sp-uhh/speech-enhancement-sgmse
model· 12 dl· ♡ 19
12 dl♡ 19

Datasets

philgzl/ears
dataset· 161 dl
161 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsSparse Evolutionary Training