Parametric Representation for Singing Voice Synthesis: a Comparative   Evaluation

Onur Babacan; Thomas Drugman; Tuomo Raitio; Daniel Erro; Thierry; Dutoit

arXiv:2006.04142·eess.AS·June 9, 2020·1 cites

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry, Dutoit

PDF

Open Access

TL;DR

This paper compares four parametric vocoder techniques for singing voice synthesis, analyzing their performance across different singer types and discussing artifacts in high-pitched voices to improve synthesis quality.

Contribution

It provides a comprehensive subjective evaluation of four parametric singing voice synthesis methods and explores artifact issues in high-pitched voices, offering insights for future improvements.

Findings

01

Performance varies with singer type.

02

Artifacts are prominent in high-pitched voices.

03

Some techniques handle different voice types better.

Abstract

Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis