Voice Disorder Analysis: a Transformer-based Approach

Alkis Koudounas; Gabriele Ciravegna; Marco Fantini; Giovanni Succo,; Erika Crosetti; Tania Cerquitelli; Elena Baralis

arXiv:2406.14693·eess.AS·September 17, 2024·1 cites

Voice Disorder Analysis: a Transformer-based Approach

Alkis Koudounas, Gabriele Ciravegna, Marco Fantini, Giovanni Succo,, Erika Crosetti, Tania Cerquitelli, Elena Baralis

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transformer-based method for non-invasive voice disorder diagnosis that handles diverse recording types and data scarcity through synthetic data and ensemble techniques, showing improved accuracy.

Contribution

It presents a novel transformer approach working on raw voice signals, combined with synthetic data generation and a Mixture of Expert ensemble for multi-type recordings.

Findings

01

Effective in disorder detection and classification

02

Significant improvement over existing methods

03

Validated on public and private datasets

Abstract

Voice disorders are pathologies significantly affecting patient quality of life. However, non-invasive automated diagnosis of these pathologies is still under-explored, due to both a shortage of pathological voice data, and diversity of the recording types used for the diagnosis. This paper proposes a novel solution that adopts transformers directly working on raw voice signals and addresses data shortage through synthetic data generation and data augmentation. Further, we consider many recording types at the same time, such as sentence reading and sustained vowel emission, by employing a Mixture of Expert ensemble to align the predictions on different data types. The experimental results, obtained on both public and private datasets, show the effectiveness of our solution in the disorder detection and classification tasks and largely improve over existing approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koudounasalkis/AI4Voice
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders

MethodsALIGN