Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen

Thomas Schuster; Julius Tr\"ogele; Nico D\"oring; Robin Kr\"uger; Matthieu Hoffmann; and Holger Friedrich

arXiv:2601.19945·cs.CL·January 29, 2026

Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen

Thomas Schuster, Julius Tr\"ogele, Nico D\"oring, Robin Kr\"uger, Matthieu Hoffmann, and Holger Friedrich

PDF

Open Access

TL;DR

This paper evaluates 29 ASR models on German medical conversations, highlighting performance differences and the need for dialect and terminology considerations in medical speech recognition.

Contribution

It provides the first comprehensive benchmark of diverse ASR models specifically for German medical dialogues, including dialects and medical terminology.

Findings

01

Best models achieve WER below 3%

02

Significant performance variation among models

03

Dialect and medical terminology impact accuracy

Abstract

Automatic Speech Recognition (ASR) offers significant potential to reduce the workload of medical personnel, for example, through the automation of documentation tasks. While numerous benchmarks exist for the English language, specific evaluations for the German-speaking medical context are still lacking, particularly regarding the inclusion of dialects. In this article, we present a curated dataset of simulated doctor-patient conversations and evaluate a total of 29 different ASR models. The test field encompasses both open-weights models from the Whisper, Voxtral, and Wav2Vec2 families as well as commercial state-of-the-art APIs (AssemblyAI, Deepgram). For evaluation, we utilize three different metrics (WER, CER, BLEU) and provide an outlook on qualitative semantic analysis. The results demonstrate significant performance differences between the models: while the best systems already…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Artificial Intelligence in Healthcare and Education · Topic Modeling