Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection
Junya Koguchi, Tomoki Koriyama

TL;DR
This paper enhances voting-based fundamental frequency estimation by providing a theoretical foundation, introducing alignment and selection improvements, and demonstrating superior robustness and accuracy across diverse audio datasets.
Contribution
It offers a theoretical analysis of voting methods, proposes alignment and selection techniques, and improves frequency estimation robustness in various audio conditions.
Findings
Outperforms state-of-the-art estimators in clean conditions
Maintains robust voiced/unvoiced detection in noisy environments
Provides a theoretical basis for voting method effectiveness
Abstract
The voting method, an ensemble approach for fundamental frequency estimation, is empirically known for its robustness but lacks thorough investigation. This paper provides a principled analysis and improvement of this technique. First, we offer a theoretical basis for its effectiveness, explaining the error variance reduction for fundamental frequency estimation and invoking Condorcet's jury theorem for voiced/unvoiced detection accuracy. To address its practical limitations, we propose two key improvements: 1) a pre-voting alignment procedure to correct temporal and frequential biases among estimators, and 2) a greedy algorithm to select a compact yet effective subset of estimators based on error correlation. Experiments on a diverse dataset of speech, singing, and music show that our proposed method with alignment outperforms individual state-of-the-art estimators in clean conditions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
