Reproducible Machine Learning-based Voice Pathology Detection:   Introducing the Pitch Difference Feature

Jan Vrba; Jakub Steinbach; Tom\'a\v{s} Jirsa; Laura Verde; Roberta De; Fazio; Yuwen Zeng; Kei Ichiji; Luk\'a\v{s} H\'ajek; Zuzana Sedl\'akov\'a,; Zuzana Urb\'aniov\'a; Martin Chovanec; Jan Mare\v{s}; Noriyasu Homma

arXiv:2410.10537·cs.SD·April 15, 2025

Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature

Jan Vrba, Jakub Steinbach, Tom\'a\v{s} Jirsa, Laura Verde, Roberta De, Fazio, Yuwen Zeng, Kei Ichiji, Luk\'a\v{s} H\'ajek, Zuzana Sedl\'akov\'a,, Zuzana Urb\'aniov\'a, Martin Chovanec, Jan Mare\v{s}, Noriyasu Homma

PDF

Open Access 1 Repo

TL;DR

This paper presents a reproducible machine learning methodology for voice pathology detection using a novel pitch difference feature and publicly available data, achieving high recall rates across genders.

Contribution

It introduces the pitch difference and NaN features, along with a comprehensive evaluation framework, enhancing reproducibility and effectiveness in voice pathology detection.

Findings

01

Achieved approximately 85.6% unweighted average recall (UAR)

02

Validated the effectiveness of novel features in pathology detection

03

Provided a publicly available code repository for reproducibility

Abstract

Purpose: We introduce a novel methodology for voice pathology detection using the publicly available Saarbr\"ucken Voice Database (SVD) and a robust feature set combining commonly used acoustic handcrafted features with two novel ones: pitch difference (relative variation in fundamental frequency) and NaN feature (failed fundamental frequency estimation). Methods: We evaluate six machine learning (ML) algorithms -- support vector machine, k-nearest neighbors, naive Bayes, decision tree, random forest, and AdaBoost -- using grid search for feasible hyperparameters and 20480 different feature subsets. Top 1000 classification models -- feature subset combinations for each ML algorithm are validated with repeated stratified cross-validation. To address class imbalance, we apply K-Means SMOTE to augment the training data. Results: Our approach achieves 85.61%, 84.69% and 85.22%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aailab-uct/automated-robust-and-reproducible-voice-pathology-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSynthetic Minority Over-sampling Technique. · Support Vector Machine