Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection

Junya Koguchi; Tomoki Koriyama

arXiv:2602.01727·cs.SD·February 3, 2026

Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection

Junya Koguchi, Tomoki Koriyama

PDF

Open Access

TL;DR

This paper enhances voting-based fundamental frequency estimation by providing a theoretical foundation, introducing alignment and selection improvements, and demonstrating superior robustness and accuracy across diverse audio datasets.

Contribution

It offers a theoretical analysis of voting methods, proposes alignment and selection techniques, and improves frequency estimation robustness in various audio conditions.

Findings

01

Outperforms state-of-the-art estimators in clean conditions

02

Maintains robust voiced/unvoiced detection in noisy environments

03

Provides a theoretical basis for voting method effectiveness

Abstract

The voting method, an ensemble approach for fundamental frequency estimation, is empirically known for its robustness but lacks thorough investigation. This paper provides a principled analysis and improvement of this technique. First, we offer a theoretical basis for its effectiveness, explaining the error variance reduction for fundamental frequency estimation and invoking Condorcet's jury theorem for voiced/unvoiced detection accuracy. To address its practical limitations, we propose two key improvements: 1) a pre-voting alignment procedure to correct temporal and frequential biases among estimators, and 2) a greedy algorithm to select a compact yet effective subset of estimators based on error correlation. Experiments on a diverse dataset of speech, singing, and music show that our proposed method with alignment outperforms individual state-of-the-art estimators in clean conditions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing