Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation
Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry, Dutoit

TL;DR
This paper compares four parametric vocoder techniques for singing voice synthesis, analyzing their performance across different singer types and discussing artifacts in high-pitched voices to improve synthesis quality.
Contribution
It provides a comprehensive subjective evaluation of four parametric singing voice synthesis methods and explores artifact issues in high-pitched voices, offering insights for future improvements.
Findings
Performance varies with singer type.
Artifacts are prominent in high-pitched voices.
Some techniques handle different voice types better.
Abstract
Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
