A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music
Sanna Wager, Lijiang Guo, Aswin Sivaraman, Minje Kim

TL;DR
This paper introduces a machine learning method using RNNs for continuous, context-aware pitch correction in singing, improving naturalness and expressiveness in karaoke and autotuning applications.
Contribution
It presents a novel data-driven approach that directly predicts continuous pitch shifts from harmonic features, bypassing traditional discrete note mapping.
Findings
Outperforms traditional methods in preserving vibrato and pitch expression.
Capable of real-time, continuous pitch correction.
Extensible to unsupervised autotuning.
Abstract
In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on examples of semi-professional singing. The proposed approach differs from existing real-time pitch correction methods by replacing pitch tracking and mapping to a discrete set of notes---for example, the twelve classes of the equal-tempered scale---with learning a correction that is continuous both in frequency and in time directly from the harmonics of the vocal and accompaniment tracks. A Recurrent Neural Network (RNN) model provides a correction that takes context into account, preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
