Assessing ASR Model Quality on Disordered Speech using BERTScore
Jimmy Tobin, Qisheng Li, Subhashini Venugopalan, Katie Seaver, Richard, Cave, Katrin Tomanek

TL;DR
This paper explores using BERTScore as an alternative to WER for evaluating ASR models, especially on disordered speech, showing it aligns better with human judgment and preserves meaning despite errors.
Contribution
It introduces BERTScore as a more informative and robust metric for assessing ASR quality on disordered speech, complementing traditional WER measures.
Findings
BERTScore correlates better with human error assessment than WER.
BERTScore is more robust to orthographic changes like contractions.
BERTScore provides a better fit for error assessment in ASR models.
Abstract
Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generation, to provide a more informative measure of ASR model quality and usefulness. Both BERTScore and WER were compared to prediction errors manually annotated by Speech Language Pathologists for error type and assessment. BERTScore was found to be more correlated with human assessment of error type and assessment. BERTScore was specifically more robust to orthographic changes (contraction and normalization errors) where meaning was preserved. Furthermore, BERTScore was a better fit of error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
MethodsLogistic Regression
