MVP: Multi-source Voice Pathology detection

Alkis Koudounas; Moreno La Quatra; Gabriele Ciravegna; Marco Fantini; Erika Crosetti; Giovanni Succo; Tania Cerquitelli; Sabato Marco Siniscalchi; Elena Baralis

arXiv:2505.20050·eess.AS·May 27, 2025

MVP: Multi-source Voice Pathology detection

Alkis Koudounas, Moreno La Quatra, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Sabato Marco Siniscalchi, Elena Baralis

PDF

Open Access 1 Repo

TL;DR

This paper presents MVP, a transformer-based multi-source voice pathology detection method that effectively combines different recording types, significantly improving diagnostic accuracy across multiple languages.

Contribution

Introduces a novel multi-source voice pathology detection approach using transformers and explores effective fusion strategies for diverse voice recordings.

Findings

01

Intermediate feature fusion with transformers yields best performance.

02

Up to 13% AUC improvement over single-source methods.

03

Effective across German, Portuguese, and Italian languages.

Abstract

Voice disorders significantly impact patient quality of life, yet non-invasive automated diagnosis remains under-explored due to both the scarcity of pathological voice data, and the variability in recording sources. This work introduces MVP (Multi-source Voice Pathology detection), a novel approach that leverages transformers operating directly on raw voice signals. We explore three fusion strategies to combine sentence reading and sustained vowel recordings: waveform concatenation, intermediate feature fusion, and decision-level combination. Empirical validation across the German, Portuguese, and Italian languages shows that intermediate feature fusion using transformers best captures the complementary characteristics of both recording types. Our approach achieves up to +13% AUC improvement over single-source methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koudounasalkis/mvp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders