Speech Disorder Classification Using Extended Factorized Hierarchical Variational Auto-encoders
Jinzi Qi, Hugo Van hamme

TL;DR
This paper proposes an extended Factorized Hierarchical Variational Auto-encoder to improve speech disorder classification by disentangling content and sequence information in disordered speech representations.
Contribution
It introduces an extended FHVAE model that better separates content and sequence features for improved disorder classification from limited data.
Findings
Extended FHVAE improves disentanglement of speech features.
Both content and sequence representations are necessary for optimal classification.
Aggregation at word and sentence levels enhances performance.
Abstract
Objective speech disorder classification for speakers with communication difficulty is desirable for diagnosis and administering therapy. With the current state of speech technology, it is evident to propose neural networks for this application. But neural network model training is hampered by a lack of labeled disordered speech data. In this research, we apply an extended version of Factorized Hierarchical Variational Auto-encoders (FHVAE) for representation learning on disordered speech. The FHVAE model extracts both content-related and sequence-related latent variables from speech data, and we utilize the extracted variables to explore how disorder type information is represented in the latent variables. For better classification performance, the latent variables are aggregated at the word and sentence level. We show that an extension of the FHVAE model succeeds in the better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research
