# Adversarially Trained End-to-end Korean Singing Voice Synthesis System

**Authors:** Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee

arXiv: 1908.01919 · 2019-08-07

## TL;DR

This paper introduces an end-to-end Korean singing voice synthesis system that employs novel phonetic enhancement, local conditioning, and adversarial training to produce realistic singing voices from lyrics and melodies.

## Contribution

It presents a new Korean singing voice synthesis system with three innovative techniques: phonetic masking, local conditioning, and adversarial training, improving phonetic control and realism.

## Key findings

- Enhanced phonetic accuracy in synthesized singing voices.
- Improved realism confirmed by quantitative and qualitative evaluations.
- Effective end-to-end synthesis from lyrics and melodies.

## Abstract

In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training. The proposed system consists of two main modules; a mel-synthesis network that generates a mel-spectrogram from the given input information, and a super-resolution network that upsamples the generated mel-spectrogram into a linear-spectrogram. In the mel-synthesis network, phonetic enhancement masking is applied to generate implicit formant masks solely from the input text, which enables a more accurate phonetic control of singing voice. In addition, we show that two other proposed methods -- local conditioning of text and pitch, and conditional adversarial training -- are crucial for a realistic generation of the human singing voice in the super-resolution process. Finally, both quantitative and qualitative evaluations are conducted, confirming the validity of all proposed methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.01919/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1908.01919/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1908.01919/full.md

---
Source: https://tomesphere.com/paper/1908.01919