Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Bram van Dijk, Tiberon Kuiper, Sirin Aoulad si Ahmed, Armel Levebvre, Jake Johnson, Jan Duin, Simon Mooijaart, Marco Spruit

TL;DR
This study evaluates the performance of state-of-the-art ASR models on Dutch speech from older adults in clinical settings, highlighting the effectiveness of generic models and the impact of processing speed and input variability.
Contribution
It demonstrates that generic multilingual ASR models outperform fine-tuned models for older adults' speech and explores speed-accuracy trade-offs in clinical ASR applications.
Findings
Generic models outperform fine-tuned models on older adults' speech
Truncating models improves speed without sacrificing much accuracy
Certain input types cause high word error rates
Abstract
Voice-controlled interfaces can support older adults in clinical contexts -- with chatbots being a prime example -- but reliable Automatic Speech Recognition (ASR) for underrepresented groups remains a bottleneck. This study evaluates state-of-the-art ASR models on language use of older Dutch adults, who interacted with the Welzijn.AI chatbot designed for geriatric contexts. We benchmark generic multilingual ASR models, and models fine-tuned for Dutch spoken by older adults, while also considering processing speed. Our results show that generic multilingual models outperform fine-tuned models, which suggests recent ASR models can generalise well out of the box to real-world datasets. Moreover, our results indicate that truncating generic models is helpful in balancing the accuracy-speed trade-off. Nonetheless, we also find inputs which cause a high word error rate and place them in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAssistive Technology in Communication and Mobility · EEG and Brain-Computer Interfaces
