A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in   Pop Music

Sanna Wager; Lijiang Guo; Aswin Sivaraman; Minje Kim

arXiv:1805.02603·cs.SD·May 8, 2018·1 cites

A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music

Sanna Wager, Lijiang Guo, Aswin Sivaraman, Minje Kim

PDF

Open Access

TL;DR

This paper introduces a machine learning method using RNNs for continuous, context-aware pitch correction in singing, improving naturalness and expressiveness in karaoke and autotuning applications.

Contribution

It presents a novel data-driven approach that directly predicts continuous pitch shifts from harmonic features, bypassing traditional discrete note mapping.

Findings

01

Outperforms traditional methods in preserving vibrato and pitch expression.

02

Capable of real-time, continuous pitch correction.

03

Extensible to unsupervised autotuning.

Abstract

In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on examples of semi-professional singing. The proposed approach differs from existing real-time pitch correction methods by replacing pitch tracking and mapping to a discrete set of notes---for example, the twelve classes of the equal-tempered scale---with learning a correction that is continuous both in frequency and in time directly from the harmonics of the vocal and accompaniment tracks. A Recurrent Neural Network (RNN) model provides a correction that takes context into account, preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing