Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen

TL;DR
This paper presents a new framework for F0 and aperiodicity analysis that significantly improves accuracy in tracking F0 trajectories for high-quality speech synthesis and modification.
Contribution
It introduces a flexible, multi-component framework with a novel aperiodicity measure, outperforming existing methods by a factor of 10 in F0 estimation accuracy.
Findings
Outperforms existing F0 extractors in tracking accuracy
Uses a complex wavelet analysis filter for aperiodicity detection
Provides a measure less sensitive to slow FM and AM
Abstract
This paper introduces a general and flexible framework for F0 and aperiodicity (additive non periodic component) analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, F0 trajectory tracker, and F0 refinement and aperiodicity extractor. A preliminary implementation of the proposed framework substantially outperformed (by a factor of 10 in terms of RMS F0 estimation error) existing F0 extractors in tracking ability of temporally varying F0 trajectories. The front end aperiodicity detector consists of a complex-valued wavelet analysis filter with a highly selective temporal and spectral envelope. This front end aperiodicity detector uses a new measure that quantifies the deviation from periodicity. The measure is less sensitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
