Developing vocal system impaired patient-aimed voice quality assessment approach using ASR representation-included multiple features
Shaoxiang Dang, Tetsuya Matsumoto, Yoshinori Takeuchi, Takashi Tsuboi,, Yasuhiro Tanaka, Daisuke Nakatsubo, Satoshi Maesawa, Ryuta Saito, Masahisa, Katsuno, and Hiroaki Kudo

TL;DR
This paper introduces a novel voice quality assessment method for impaired patients using ASR-based features and self-supervised learning, demonstrating high accuracy and correlation in clinical datasets.
Contribution
It presents an innovative approach combining ASR representations and multiple features to improve voice quality assessment in clinical settings with limited data.
Findings
High correlation (>0.8 PCC) in voice quality prediction on PVQD dataset
Achieved accuracy (<0.5 MSE) in predicting voice indicators
Progress in assessing voice quality of Parkinson's patients post-DBS surgery
Abstract
The potential of deep learning in clinical speech processing is immense, yet the hurdles of limited and imbalanced clinical data samples loom large. This article addresses these challenges by showcasing the utilization of automatic speech recognition and self-supervised learning representations, pre-trained on extensive datasets of normal speech. This innovative approach aims to estimate voice quality of patients with impaired vocal systems. Experiments involve checks on PVQD dataset, covering various causes of vocal system damage in English, and a Japanese dataset focusing on patients with Parkinson's disease before and after undergoing subthalamic nucleus deep brain stimulation (STN-DBS) surgery. The results on PVQD reveal a notable correlation (>0.8 on PCC) and an extraordinary accuracy (<0.5 on MSE) in predicting Grade, Breathy, and Asthenic indicators. Meanwhile, progress has been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis
