Empowering Dysarthric Speech: Leveraging Advanced LLMs for Accurate   Speech Correction and Multimodal Emotion Analysis

Kaushal Attaluri; Anirudh CHVS; Sireesha Chittepu

arXiv:2410.12867·cs.CL·October 18, 2024

Empowering Dysarthric Speech: Leveraging Advanced LLMs for Accurate Speech Correction and Multimodal Emotion Analysis

Kaushal Attaluri, Anirudh CHVS, Sireesha Chittepu

PDF

Open Access

TL;DR

This paper presents a novel framework that uses advanced large language models and speech recognition techniques to accurately correct dysarthric speech and analyze associated emotions, improving communication for affected individuals.

Contribution

It introduces an innovative combination of speech-to-text conversion, emotion detection, and sentence prediction using fine-tuned LLMs and benchmark models for dysarthric speech enhancement.

Findings

01

High accuracy in speech correction from dysarthric to intended sentences

02

Effective emotion recognition including happiness, sadness, and anger

03

Demonstrated scalability on benchmark models and specialized datasets

Abstract

Dysarthria is a motor speech disorder caused by neurological damage that affects the muscles used for speech production, leading to slurred, slow, or difficult-to-understand speech. It affects millions of individuals worldwide, including those with conditions such as stroke, traumatic brain injury, cerebral palsy, Parkinsons disease, and multiple sclerosis. Dysarthria presents a major communication barrier, impacting quality of life and social interaction. This paper introduces a novel approach to recognizing and translating dysarthric speech, empowering individuals with this condition to communicate more effectively. We leverage advanced large language models for accurate speech correction and multimodal emotion analysis. Dysarthric speech is first converted to text using OpenAI Whisper model, followed by sentence prediction using fine-tuned open-source models and benchmark models like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders

MethodsDense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Adam · Linear Layer · Softmax · Multi-Head Attention · Dropout