A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

Verena Blaschke; Miriam Winkler; Constantin F\"orster; Gabriele Wenger-Glemser; Barbara Plank

arXiv:2506.02894·cs.CL·September 30, 2025

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

Verena Blaschke, Miriam Winkler, Constantin F\"orster, Gabriele Wenger-Glemser, Barbara Plank

PDF

TL;DR

This paper introduces Betthupferl, a new dataset of German dialect speech and translations, enabling research on dialectal robustness in ASR and speech translation models.

Contribution

It provides a novel, annotated dialect speech dataset and benchmarks multilingual ASR models on dialect-to-standard German translation tasks.

Findings

01

ASR models show varying accuracy on dialects

02

Models sometimes normalize dialectal grammar

03

Dialectal features influence translation quality

Abstract

Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, an evaluation dataset containing four hours of read speech in three dialect groups spoken in Southeast Germany (Franconian, Bavarian, Alemannic), and half an hour of Standard German speech. We provide both dialectal and Standard German transcriptions, and analyze the linguistic differences between them. We benchmark several multilingual state-of-the-art ASR models on speech translation into Standard German, and find differences between how much the output resembles the dialectal vs. standardized transcriptions. Qualitative error analyses of the best ASR model reveal that it sometimes normalizes grammatical differences, but often stays closer to the dialectal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.