Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Joel Shor, Ruyue Agnes Bi, Subhashini Venugopalan, Steven Ibara, Roman, Goldenberg, Ehud Rivlin

TL;DR
This paper introduces Clinical BERTScore, a new ASR evaluation metric tailored for medical contexts that aligns more closely with clinician preferences than traditional metrics, supported by a new benchmark dataset.
Contribution
The paper presents the Clinical BERTScore, a novel metric for medical ASR evaluation, and releases the Clinician Transcript Preference benchmark dataset for future research.
Findings
CBERTScore aligns better with clinician preferences than existing metrics
The benchmark dataset includes 149 medical sentences with clinician preferences
CBERTScore demonstrates improved correlation with clinical judgments
Abstract
Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP) and make it publicly available for the community to further develop clinically-aware ASR metrics. To our knowledge, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Topic Modeling · Artificial Intelligence in Healthcare and Education
