Performant ASR Models for Medical Entities in Accented Speech
Tejumade Afonja, Tobi Olatunji, Sewade Ogun, Naome A. Etori, Abraham, Owodunni, Moshood Yekini

TL;DR
This paper evaluates various ASR models on accented clinical speech, revealing challenges in accurately recognizing medical entities and demonstrating that fine-tuning significantly improves medical WER, enhancing healthcare safety.
Contribution
It introduces a comprehensive evaluation of ASR models on accented medical speech and proposes a novel alignment algorithm to measure medical entity recognition performance.
Findings
Fine-tuning improves medical WER by 25-34%.
Errors in clinical entities are higher than overall WER.
Fine-tuned models show better practical applicability in healthcare.
Abstract
Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsALIGN
