Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen
Thomas Schuster, Julius Tr\"ogele, Nico D\"oring, Robin Kr\"uger, Matthieu Hoffmann, and Holger Friedrich

TL;DR
This paper evaluates 29 ASR models on German medical conversations, highlighting performance differences and the need for dialect and terminology considerations in medical speech recognition.
Contribution
It provides the first comprehensive benchmark of diverse ASR models specifically for German medical dialogues, including dialects and medical terminology.
Findings
Best models achieve WER below 3%
Significant performance variation among models
Dialect and medical terminology impact accuracy
Abstract
Automatic Speech Recognition (ASR) offers significant potential to reduce the workload of medical personnel, for example, through the automation of documentation tasks. While numerous benchmarks exist for the English language, specific evaluations for the German-speaking medical context are still lacking, particularly regarding the inclusion of dialects. In this article, we present a curated dataset of simulated doctor-patient conversations and evaluate a total of 29 different ASR models. The test field encompasses both open-weights models from the Whisper, Voxtral, and Wav2Vec2 families as well as commercial state-of-the-art APIs (AssemblyAI, Deepgram). For evaluation, we utilize three different metrics (WER, CER, BLEU) and provide an outlook on qualitative semantic analysis. The results demonstrate significant performance differences between the models: while the best systems already…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Artificial Intelligence in Healthcare and Education · Topic Modeling
