CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

TL;DR
ConTuner is a fast, high-fidelity singing voice beautifying system that corrects pitch and enhances expressiveness using a diffusion model with optimized conditions, applicable to Mandarin and English songs.
Contribution
The paper introduces ConTuner, a novel diffusion-based system that simultaneously corrects pitch and enhances expressiveness without paired data, addressing limitations of existing methods.
Findings
ConTuner achieves effective pitch correction and expressiveness enhancement.
The system performs well on both Mandarin and English songs.
Ablation studies confirm the effectiveness of the expressiveness enhancer and acceleration methods.
Abstract
Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and singing voice beautifying doesn't only contain pitch correction but other aspects like emotion and rhythm. Since we propose a fast and high-fidelity singing voice beautifying system called ConTuner, a diffusion model combined with the modified condition to generate the beautified Mel-spectrogram, where the modified condition is composed of optimized pitch and expressiveness. For pitch correction, we establish a mapping relationship from MIDI, spectrum envelope to pitch. To make amateur singing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech and Audio Processing · Speech Recognition and Synthesis
