Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation
A. Queiroz, R. Coelho

TL;DR
This paper presents a new noise-robust method for fundamental frequency estimation that separates speech into low and high frequency components using empirical mode decomposition, leading to improved accuracy in noisy environments.
Contribution
The novel approach combines empirical mode decomposition with frequency separation to enhance F0 estimation accuracy in noisy speech signals.
Findings
Outperforms existing methods in frequency separation accuracy.
Improves F0 estimation accuracy across various noise conditions.
Effective in low SNR scenarios.
Abstract
This paper introduces a novel method to separate noisy speech into low or high frequency frames, in order to improve fundamental frequency (F0) estimation accuracy. In this proposal, the target signal is analyzed by means of the ensemble empirical mode decomposition. Next, the pitch information is extracted from the first decomposition modes. This feature indicates the frequency region where the F0 of speech should be located, thus separating the frames into low-frequency (LF) or high-frequency (HF). The separation is applied to correct candidates extracted from a conventional fundamental frequency detection method, and hence improving the accuracy of F0 estimate. The proposed method is evaluated in experiments with CSTR and TIMIT databases, considering six acoustic noises under various signal-to-noise ratios. A pitch enhancement algorithm is adopted as baseline in the evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
