TL;DR
BERT-APC is a novel reference-free pitch correction framework that uses musical context inference via a music language model to improve vocal pitch accuracy while preserving expressiveness.
Contribution
It introduces a music language model-based approach for reference-free pitch correction, enhancing naturalness and robustness in vocal performances.
Findings
Outperformed recent singing voice transcription models in pitch prediction accuracy.
Achieved the highest MOS rating of 4.32, surpassing Auto-Tune and Melodyne.
Demonstrated superior correction quality on highly detuned samples.
Abstract
Automatic Pitch Correction (APC) enhances vocal recordings by aligning pitch deviations with intended musical notes. However, existing APC systems either rely on reference pitches, which limits practical applicability, or employ simple pitch estimation algorithms that often fail to preserve expressiveness and naturalness. We propose BERT-APC, a reference-free APC framework that corrects pitch errors while maintaining the expressiveness and naturalness of vocal performances. In BERT-APC, a stationary pitch predictor first estimates the stationary pitch of each note from the detuned singing voice, where stationary pitch is the continuous pitch from the stable region of a note and approximates its perceived pitch. A context-aware note pitch predictor then infers the intended pitch sequence using a repurposed music language model that incorporates musical context. Finally, a note-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
