Performant ASR Models for Medical Entities in Accented Speech

Tejumade Afonja; Tobi Olatunji; Sewade Ogun; Naome A. Etori; Abraham; Owodunni; Moshood Yekini

arXiv:2406.12387·eess.AS·June 19, 2024

Performant ASR Models for Medical Entities in Accented Speech

Tejumade Afonja, Tobi Olatunji, Sewade Ogun, Naome A. Etori, Abraham, Owodunni, Moshood Yekini

PDF

Open Access

TL;DR

This paper evaluates various ASR models on accented clinical speech, revealing challenges in accurately recognizing medical entities and demonstrating that fine-tuning significantly improves medical WER, enhancing healthcare safety.

Contribution

It introduces a comprehensive evaluation of ASR models on accented medical speech and proposes a novel alignment algorithm to measure medical entity recognition performance.

Findings

01

Fine-tuning improves medical WER by 25-34%.

02

Errors in clinical entities are higher than overall WER.

03

Fine-tuned models show better practical applicability in healthcare.

Abstract

Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsALIGN