Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification
Laurin Wagner, Mario Zusag, Theresa Bloder

TL;DR
This paper introduces an automated speech analysis pipeline combining advanced speech recognition and NLP techniques to accurately classify aphasia subtypes and distinguish affected speech from healthy controls, with potential for broader diagnostic applications.
Contribution
It presents a novel integrated approach leveraging CTC and encoder-decoder ASR models with NLP features for robust, interpretable aphasia classification, achieving high accuracy.
Findings
Achieved 90% accuracy in classifying aphasia types.
Human-level accuracy in distinguishing aphasic from healthy speech.
Pipeline adaptable to other diseases and languages.
Abstract
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Text Readability and Simplification
