Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra
Thomas Drugman, Yannis Stylianou

TL;DR
This paper introduces a novel MVF estimation method that combines amplitude and phase spectra, improving accuracy especially in high-pitched voices, and enhances speech synthesis quality.
Contribution
It presents a new MVF estimation approach utilizing phase information alongside amplitude spectra, outperforming existing methods in speech and singing voice synthesis.
Findings
Superior performance in objective evaluations
Significant perceptual improvements in high-pitched voices
Outperforms state-of-the-art MVF estimation methods
Abstract
Maximum Voiced Frequency (MVF) is used in various speech models as the spectral boundary separating periodic and aperiodic components during the production of voiced sounds. Recent studies have shown that its proper estimation and modeling enhance the quality of statistical parametric speech synthesizers. Contrastingly, these same methods of MVF estimation have been reported to degrade the performance of singing voice synthesizers. This paper proposes a new approach for MVF estimation which exploits both amplitude and phase spectra. It is shown that phase conveys relevant information about the harmonicity of the voice signal, and that it can be jointly used with features derived from the amplitude spectrum. This information is further integrated into a maximum likelihood criterion which provides a decision about the MVF estimate. The proposed technique is compared to two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
