Towards detecting the pathological subharmonic voicing with fully   convolutional neural networks

Takeshi Ikuma; Melda Kunduk; Brad Story; and Andrew J. McWhorter

arXiv:2501.09159·eess.AS·January 17, 2025

Towards detecting the pathological subharmonic voicing with fully convolutional neural networks

Takeshi Ikuma, Melda Kunduk, Brad Story, and Andrew J. McWhorter

PDF

Open Access

TL;DR

This paper proposes a deep learning approach using fully convolutional neural networks trained on synthetic voice signals to reliably detect subharmonic phonation, a marker of voice disorders, achieving over 98% accuracy in synthetic tests.

Contribution

It introduces a novel CNN-based method trained on synthetic data for detecting subharmonic voice signals, addressing a challenging problem in voice disorder analysis.

Findings

01

Over 98% classification accuracy on synthetic data

02

Encouraging results on real sustained vowel recordings

03

Identifies areas for future improvement

Abstract

Many voice disorders induce subharmonic phonation, but voice signal analysis is currently lacking a technique to detect the presence of subharmonics reliably. Distinguishing subharmonic phonation from normal phonation is a challenging task as both are nearly periodic phenomena. Subharmonic phonation adds cyclical variations to the normal glottal cycles. Hence, the estimation of subharmonic period requires a wholistic analysis of the signals. Deep learning is an effective solution to this type of complex problem. This paper describes fully convolutional neural networks which are trained with synthesized subharmonic voice signals to classify the subharmonic periods. Synthetic evaluation shows over 98% classification accuracy, and assessment of sustained vowel recordings demonstrates encouraging outcomes as well as the areas for future improvements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing