CLAP-Based Automatic Word Naming Recognition in Post-Stroke Aphasia
Yacouba Kaloga, Marina Laganaro, Ina Kodrasi

TL;DR
This paper introduces a CLAP-based method for automatic word recognition in post-stroke aphasia patients, effectively handling disfluencies and mispronunciations to improve assessment accuracy.
Contribution
It presents a novel CLAP-based approach that models word recognition as an audio-text matching task, enhancing recognition in challenging speech samples.
Findings
Achieves up to 90% accuracy on patient datasets
Outperforms existing classification and ASR baselines
Effective in recognizing disfluent and mispronounced words
Abstract
Conventional automatic word-naming recognition systems struggle to recognize words from post-stroke patients with aphasia because of disfluencies and mispronunciations, limiting reliable automated assessment in this population. In this paper, we propose a Contrastive Language-Audio Pretraining (CLAP) based approach for automatic word-naming recognition to address this challenge by leveraging text-audio alignment. Our approach treats word-naming recognition as an audio-text matching problem, projecting speech signals and textual prompts into a shared embedding space to identify intended words even in challenging recordings. Evaluated on two speech datasets of French post-stroke patients with aphasia, our approach achieves up to 90% accuracy, outperforming existing classification-based and automatic speech recognition-based baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Speech Recognition and Synthesis · Voice and Speech Disorders
