Comparison of fundamental frequency estimators with subharmonic voice   signals

Takeshi Ikuma; Melda Kunduk; and Andrew J. McWhorter

arXiv:2501.04789·eess.AS·January 10, 2025

Comparison of fundamental frequency estimators with subharmonic voice signals

Takeshi Ikuma, Melda Kunduk, and Andrew J. McWhorter

PDF

Open Access

TL;DR

This study compares five fundamental frequency estimators in clinical voice analysis, highlighting FCN-F0's superior accuracy in detecting subharmonic voicing, which is crucial for avoiding false negatives in acoustic parameter assessment.

Contribution

The paper introduces a comprehensive comparison of five F0 estimators, emphasizing the effectiveness of a deep-learning model, FCN-F0, in identifying subharmonic signals in sustained vowels.

Findings

01

FCN-F0 outperforms other estimators in accuracy

02

CREPE and Harvest are also highly capable

03

Subharmonic detection is critical for clinical voice analysis

Abstract

In clinical voice signal analysis, mishandling of subharmonic voicing may cause an acoustic parameter to signal false negatives. As such, the ability of a fundamental frequency estimator to identify speaking fundamental frequency is critical. This paper presents a sustained-vowel study, which used a quality-of-estimate classification to identify subharmonic errors and subharmonics-to-harmonics ratio (SHR) to measure the strength of subharmonic voicing. Five estimators were studied with a sustained vowel dataset: Praat, YAAPT, Harvest, CREPE, and FCN-F0. FCN-F0, a deep-learning model, performed the best both in overall accuracy and in correctly resolving subharmonic signals. CREPE and Harvest are also highly capable estimators for sustained vowel analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing