KaraTuner: Towards end to end natural pitch correction for singing voice   in karaoke

Xiaobin Zhuang; Huiran Yu; Weifeng Zhao; Tao Jiang; Peng Hu

arXiv:2110.09121·cs.SD·June 28, 2022·1 cites

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu

PDF

Open Access

TL;DR

KaraTuner is an end-to-end neural system that automatically corrects pitch in singing voices for karaoke, ensuring naturalness and sound quality through integrated pitch prediction and vocoding.

Contribution

It introduces a novel neural architecture combining pitch prediction and vocoding for seamless, end-to-end pitch correction in singing voices.

Findings

01

Outperforms rule-based pitch correction in preference tests.

02

Achieves higher timbre consistency and sound quality than existing vocoders.

03

Demonstrates effective long-term dependency modeling with Transformer.

Abstract

An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and resynthesizes the singing voice directly from the tuned pitch and vocal spectrum extracted from the original recordings. Several vital technical points have been introduced in KaraTuner to ensure pitch accuracy, pitch naturalness, timbre consistency, and sound quality. A feed-forward Transformer is employed in the pitch predictor to capture longterm dependencies in the vocal spectrum and musical note. We also develop a pitch-controllable vocoder based on a novel source-filter block and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies