A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition

Dimme de Groot; Yuanyuan Zhang; Jorge Martinez; Odette Scharenborg

arXiv:2603.09725·eess.AS·March 11, 2026

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition

Dimme de Groot, Yuanyuan Zhang, Jorge Martinez, Odette Scharenborg

PDF

Open Access

TL;DR

This paper introduces DRES, a semi-spontaneous Dutch speech dataset recorded in noisy indoor environments, and evaluates the impact of speech enhancement on recognition performance using state-of-the-art models.

Contribution

The creation of DRES, a realistic Dutch speech dataset, and the comprehensive evaluation of speech enhancement and recognition models in real-world noisy conditions.

Findings

01

Five ASR models achieved WERs below 22% on DRES.

02

Modern single-channel speech enhancement did not improve ASR performance in realistic scenarios.

03

Evaluation in real-world conditions is crucial for assessing speech processing models.

Abstract

We present DRES: a 1.5-hour Dutch realistic elicited (semi-spontaneous) speech dataset from 80 speakers recorded in noisy, public indoor environments. DRES was designed as a test set for the evaluation of state-of-the-art (SOTA) automatic speech recognition (ASR) and speech enhancement (SE) models in a real-world scenario: a person speaking in a public indoor space with background talkers and noise. The speech was recorded with a four-channel linear microphone array. In this work we evaluate the speech quality of five well-known single-channel SE algorithms and the recognition performance of eight SOTA off-the-shelf ASR models before and after applying SE on the speech of DRES. We found that five out of the eight ASR models have WERs lower than 22% on DRES, despite the challenging conditions. In contrast to recent work, we did not find a positive effect of modern single-channel SE on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis