Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
Aref Farhadipour, Hadi Veisi

TL;DR
This paper introduces a gammatonegram-based image recognition approach using transfer learning with AlexNet for end-to-end dysarthric speech processing, achieving high accuracy in recognition, identification, and intelligibility assessment.
Contribution
It proposes a novel gammatonegram representation combined with CNN transfer learning for comprehensive dysarthric speech tasks.
Findings
Speech recognition accuracy: 91.29%
Speaker identification accuracy: 87.74%
Intelligibility assessment accuracy: 96.47%
Abstract
Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis
