Bridging the Perceptual-Statistical Gap in Dysarthria Assessment: Why Machine Learning Still Falls Short
Krishna Gurugubelli

TL;DR
This paper analyzes why machine learning models for dysarthria assessment lag behind human experts, highlighting perceptual and statistical differences, and proposes strategies to improve clinical reliability and interpretability.
Contribution
It introduces the concept of the perceptual-statistical gap, reviews current methods, and suggests practical strategies and evaluation protocols to bridge the performance gap.
Findings
Models still underperform compared to human experts.
Perceptual features and multimodal approaches can improve assessment.
Evaluation protocols aligned with clinical goals are essential.
Abstract
Automated dysarthria detection and severity assessment from speech have attracted significant research attention due to their potential clinical impact. Despite rapid progress in acoustic modeling and deep learning, models still fall short of human expert performance. This manuscript provides a comprehensive analysis of the reasons behind this gap, emphasizing a conceptual divergence we term the ``perceptual-statistical gap''. We detail human expert perceptual processes, survey machine learning representations and methods, review existing literature on feature sets and modeling strategies, and present a theoretical analysis of limits imposed by label noise and inter-rater variability. We further outline practical strategies to narrow the gap, perceptually motivated features, self-supervised pretraining, ASR-informed objectives, multimodal fusion, human-in-the-loop training, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
