TL;DR
This paper introduces a memory-efficient audio-to-lyrics alignment method that accurately aligns lyrics in long polyphonic music recordings by using anchoring words and segmentation, achieving competitive performance with less computational resources.
Contribution
A novel low-memory audio-to-lyrics alignment approach that segments recordings based on anchor words and demonstrates competitive accuracy with reduced resource usage.
Findings
Performs well on benchmark datasets
Requires significantly less memory than state-of-the-art methods
Highlights importance of source separation for accuracy
Abstract
Lyrics alignment in long music recordings can be memory exhaustive when performed in a single pass. In this study, we present a novel method that performs audio-to-lyrics alignment with a low memory consumption footprint regardless of the duration of the music recording. The proposed system first spots the anchoring words within the audio signal. With respect to these anchors, the recording is then segmented and a second-pass alignment is performed to obtain the word timings. We show that our audio-to-lyrics alignment system performs competitively with the state-of-the-art, while requiring much less computational resources. In addition, we utilise our lyrics alignment system to segment the music recordings into sentence-level chunks. Notably on the segmented recordings, we report the lyrics transcription scores on a number of benchmark test sets. Finally, our experiments highlight the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
