Robust Pitch Estimation and Tracking for Speakers Based on Subband Encoding and the Generalized Labeled Multi-Bernoulli Filter
Shoufeng Lin

TL;DR
This paper introduces a robust pitch estimation and tracking method using subband encoding and a GLMB filter, demonstrating improved accuracy and noise robustness in various acoustic environments.
Contribution
It presents a novel pitch estimator and tracker that leverage subband decomposition, a new frequency coverage metric, and a GLMB filter with a unique pitch transition model.
Findings
Achieves higher accuracy than existing methods in noisy conditions.
Demonstrates robustness against reverberation in real recordings.
Outperforms state-of-the-art pitch estimation techniques in experiments.
Abstract
This paper proposes a new pitch estimator and a novel pitch tracker for speakers. We first decompose the sound signal into subbands using an auditory filterbank, assuming time-frequency sparsity of human speech. Instead of directly selecting the number of subbands according to experience, we propose a novel frequency coverage metric to derive the number of subbands and the center frequencies of the filterbank. The subband signals are then encoded inspired by the computational auditory scene analysis (CASA) approach, and the normalized autocorrelations are calculated for pitch estimation. To suppress spurious errors and track the speaker identity, the temporal continuity constraint is exploited and a Generalized Labeled Multi-Bernoulli (GLMB) filter is adapted for pitch tracking, where we use a novel pitch state transition model based on the Ornstein-Uhlenbeck process, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
