Impact of automatic speech recognition quality on Alzheimer's disease detection from spontaneous speech: a reproducible benchmark study with lexical modeling and statistical validation
Himadri S Samanta

TL;DR
This study shows that higher-quality automatic speech recognition transcripts significantly improve Alzheimer's disease detection from spontaneous speech, emphasizing the importance of ASR choice in clinical language modeling.
Contribution
It provides a reproducible benchmark demonstrating the impact of ASR quality on Alzheimer's detection accuracy using lexical features and interpretable models.
Findings
Whisper-small transcripts outperform Whisper-base in classification accuracy.
ASR quality significantly influences model performance more than classifier complexity.
High-quality transcripts enable simple lexical models to achieve competitive detection results.
Abstract
Early detection of Alzheimer's disease from spontaneous speech has emerged as a promising non-invasive screening approach. However, the influence of automatic speech recognition (ASR) quality on downstream clinical language modeling remains insufficiently understood. In this study, we investigate Alzheimer's disease detection using lexical features derived from Whisper ASR transcripts on the ADReSSo 2021 diagnosis dataset. We evaluate interpretable machine-learning models, including Logistic Regression and Linear Support Vector Machines, using TF-IDF text representations under repeated 5x5 stratified cross-validation. Our results demonstrate that transcript quality has a statistically significant impact on classification performance. Models trained on Whisper-small transcripts consistently outperform those using Whisper-base transcripts, achieving balanced accuracy above 0.7850 with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
