Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Eyal Liron Dolev, Clemens Fidel Lutz, No\"emi Aepli

TL;DR
This study systematically evaluates the performance of the Whisper ASR model on Swiss German dialects through automatic, qualitative, and human assessments, revealing its viability for Swiss German speech recognition when output in Standard German.
Contribution
It provides a comprehensive evaluation of Whisper's ability to transcribe Swiss German, including new test data and multi-faceted analysis, which was previously lacking.
Findings
Whisper performs well on Swiss German with Standard German output
Automatic and human evaluations show promising results
New test set based on clinical interviews enhances assessment diversity
Abstract
Whisper is a state-of-the-art automatic speech recognition (ASR) model (Radford et al., 2022). Although Swiss German dialects are allegedly not part of Whisper's training data, preliminary experiments showed that Whisper can transcribe Swiss German quite well, with the output being a speech translation into Standard German. To gain a better understanding of Whisper's performance on Swiss German, we systematically evaluate it using automatic, qualitative, and human evaluation. We test its performance on three existing test sets: SwissDial (Dogan-Sch\"onberger et al., 2021), STT4SG-350 (Pl\"uss et al., 2023), and Swiss Parliaments Corpus (Pl\"uss et al., 2021). In addition, we create a new test set for this work, based on short mock clinical interviews. For automatic evaluation, we used word error rate (WER) and BLEU. In the qualitative analysis, we discuss Whisper's strengths and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLinguistics, Language Diversity, and Identity · Linguistic research and analysis
MethodsSparse Evolutionary Training
