Voice Pathology Detection Using Phonation
Sri Raksha Siva, Nived Suthahar, Prakash Boominathan, Uma Ranjan

TL;DR
This paper presents a noninvasive, machine learning-based framework utilizing phonation data and acoustic features to accurately detect voice pathologies, aiming to improve early diagnosis and patient outcomes.
Contribution
It introduces a novel combination of acoustic features, RNN models with attention, and data augmentation techniques for voice pathology detection.
Findings
High classification accuracy achieved with RNN models.
Data augmentation improves model robustness.
Scale-based features enhance detection of irregularities.
Abstract
Voice disorders significantly affect communication and quality of life, requiring an early and accurate diagnosis. Traditional methods like laryngoscopy are invasive, subjective, and often inaccessible. This research proposes a noninvasive, machine learning-based framework for detecting voice pathologies using phonation data. Phonation data from the Saarbr\"ucken Voice Database are analyzed using acoustic features such as Mel Frequency Cepstral Coefficients (MFCCs), chroma features, and Mel spectrograms. Recurrent Neural Networks (RNNs), including LSTM and attention mechanisms, classify samples into normal and pathological categories. Data augmentation techniques, including pitch shifting and Gaussian noise addition, enhance model generalizability, while preprocessing ensures signal quality. Scale-based features, such as H\"older and Hurst exponents, further capture signal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Respiratory and Cough-Related Research · Phonocardiography and Auscultation Techniques
