End-to-End Lyrics Recognition with Self-supervised Learning
Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny, Paola Garcia

TL;DR
This paper introduces an end-to-end self-supervised learning approach for lyrics recognition, demonstrating significant performance improvements over previous methods and analyzing the impact of background music and domain generalization.
Contribution
It establishes a baseline for end-to-end lyrics recognition and evaluates various SSL models, showing their effectiveness without large corpus language models and analyzing background music effects.
Findings
SSL models outperform previous SOTA by 5.23% and 2.4% on dev and test sets
Background music hampers SSL feature extraction efficiency
SSL models exhibit limited out-of-domain generalization
Abstract
Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
MethodsTest
