Gammatonegram Representation for End-to-End Dysarthric Speech Processing   Tasks: Speech Recognition, Speaker Identification, and Intelligibility   Assessment

Aref Farhadipour; Hadi Veisi

arXiv:2307.03296·eess.AS·March 22, 2024

Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment

Aref Farhadipour, Hadi Veisi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gammatonegram-based image recognition approach using transfer learning with AlexNet for end-to-end dysarthric speech processing, achieving high accuracy in recognition, identification, and intelligibility assessment.

Contribution

It proposes a novel gammatonegram representation combined with CNN transfer learning for comprehensive dysarthric speech tasks.

Findings

01

Speech recognition accuracy: 91.29%

02

Speaker identification accuracy: 87.74%

03

Intelligibility assessment accuracy: 96.47%

Abstract

Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

areffarhadi/gammatonegram_cnn_dysarthric_speech
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis