Selfsupervised learning for pathological speech detection

Shakeel Ahmad Sheikh

arXiv:2406.02572·eess.AS·June 6, 2024

Selfsupervised learning for pathological speech detection

Shakeel Ahmad Sheikh

PDF

Open Access

TL;DR

This paper explores the use of self-supervised learning embeddings, like wav2vec2, to improve automatic detection of pathological speech disorders, addressing data scarcity and aiming for more accurate, efficient diagnosis.

Contribution

It introduces a novel application of self-supervised speech representations for pathological speech detection, enhancing performance with limited labeled data.

Findings

01

Self-supervised embeddings improve detection accuracy.

02

Multilingual models show robustness across languages.

03

Enhanced representations reduce reliance on large labeled datasets.

Abstract

Speech production is a complex phenomenon, wherein the brain orchestrates a sequence of processes involving thought processing, motor planning, and the execution of articulatory movements. However, this intricate execution of various processes is susceptible to influence and disruption by various neurodegenerative pathological speech disorders, such as Parkinsons' disease, resulting in dysarthria, apraxia, and other conditions. These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation. Diagnosing these speech disorders in clinical settings typically involves auditory perceptual tests, which are time-consuming, and the diagnosis can vary among clinicians based on their experiences, biases, and cognitive load during the diagnosis. Additionally, unlike neurotypical speakers, patients with speech pathologies or impairments are unable to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis