SingOMD: Singing Oriented Multi-resolution Discrete Representation   Construction from Speech Models

Yuxun Tang; Yuning Wu; Jiatong Shi; Qin Jin

arXiv:2406.08905·cs.SD·June 21, 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin

PDF

Open Access

TL;DR

SingOMD is a novel method that extracts singing-oriented multi-resolution discrete representations from speech SSL models, improving singing voice synthesis by addressing domain gaps and requiring refined features.

Contribution

It introduces a new approach to adapt speech SSL features for singing, incorporating multi-resolution modules and discretization for enhanced singing voice generation.

Findings

01

Effective in singing vocoders and voice synthesis

02

Robust and efficient representation extraction

03

Improves quality of singing voice synthesis

Abstract

Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing