Voice Disorder Analysis: a Transformer-based Approach
Alkis Koudounas, Gabriele Ciravegna, Marco Fantini, Giovanni Succo,, Erika Crosetti, Tania Cerquitelli, Elena Baralis

TL;DR
This paper introduces a transformer-based method for non-invasive voice disorder diagnosis that handles diverse recording types and data scarcity through synthetic data and ensemble techniques, showing improved accuracy.
Contribution
It presents a novel transformer approach working on raw voice signals, combined with synthetic data generation and a Mixture of Expert ensemble for multi-type recordings.
Findings
Effective in disorder detection and classification
Significant improvement over existing methods
Validated on public and private datasets
Abstract
Voice disorders are pathologies significantly affecting patient quality of life. However, non-invasive automated diagnosis of these pathologies is still under-explored, due to both a shortage of pathological voice data, and diversity of the recording types used for the diagnosis. This paper proposes a novel solution that adopts transformers directly working on raw voice signals and addresses data shortage through synthetic data generation and data augmentation. Further, we consider many recording types at the same time, such as sentence reading and sustained vowel emission, by employing a Mixture of Expert ensemble to align the predictions on different data types. The experimental results, obtained on both public and private datasets, show the effectiveness of our solution in the disorder detection and classification tasks and largely improve over existing approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders
MethodsALIGN
