Singing voice correction using canonical time warping
Yin-Jyun Luo, Ming-Tso Chen, Tai-Shih Chi, Li Su

TL;DR
This paper introduces a canonical time warping method for singing voice correction that aligns amateur recordings to professional ones, resulting in improved pitch accuracy and naturalness, outperforming existing techniques.
Contribution
The paper presents a novel application of canonical time warping for singing voice correction, demonstrating robustness and superior performance over traditional methods.
Findings
CTW is robust against pitch-shifting and time-stretching effects.
Subjective tests show CTW outperforms DTW and auto-tuning software.
The method is applicable in real-world singing correction scenarios.
Abstract
Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Time Series Analysis and Forecasting · Speech Recognition and Synthesis
