Improving Lyrics Alignment through Joint Pitch Detection
Jiawen Huang, Emmanouil Benetos, Sebastian Ewert

TL;DR
This paper introduces a multi-task learning method that jointly uses pitch detection and lyrics alignment, leveraging accurate pitch annotations to improve the precision of lyrics timing in singing voice analysis.
Contribution
It presents a novel joint pitch detection and lyrics alignment framework that exploits musical properties often ignored by speech-based systems, enhancing alignment accuracy.
Findings
Improved lyrics alignment accuracy with joint pitch information
Boundary detection reduces cross-line errors
Enhanced alignment performance over traditional methods
Abstract
In recent years, the accuracy of automatic lyrics alignment methods has increased considerably. Yet, many current approaches employ frameworks designed for automatic speech recognition (ASR) and do not exploit properties specific to music. Pitch is one important musical attribute of singing voice but it is often ignored by current systems as the lyrics content is considered independent of the pitch. In practice, however, there is a temporal correlation between the two as note starts often correlate with phoneme starts. At the same time the pitch is usually annotated with high temporal accuracy in ground truth data while the timing of lyrics is often only available at the line (or word) level. In this paper, we propose a multi-task learning approach for lyrics alignment that incorporates pitch and thus can make use of a new source of highly accurate temporal information. Our results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
