Continuous Speech for Improved Learning Pathological Voice Disorders
Syu-Siang Wang, Chi-Te Wang, Chih-Chung Lai, Yu Tsao, Shih-Hau Fang

TL;DR
This study introduces a novel method using continuous Mandarin speech and BiLSTM networks to classify four common voice disorders, achieving significantly improved accuracy over single vowel approaches.
Contribution
The paper presents a new approach leveraging continuous speech and deep learning for more accurate voice disorder classification, outperforming traditional single-vowel methods.
Findings
Achieved accuracy of 78.12-89.27% in disorder classification.
Significant improvements in recall compared to single vowel systems.
Validated the approach on a large-scale clinical database.
Abstract
Goal: Numerous studies had successfully differentiated normal and abnormal voice samples. Nevertheless, further classification had rarely been attempted. This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders (i.e. functional dysphonia, neoplasm, phonotrauma, and vocal palsy). Methods: In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features. The experiments were conducted on a large-scale database, wherein 1,045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019. Results: Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall improvements of 78.12-89.27% and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Dysphagia Assessment and Management · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Memory Network
