PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber   for Polyphonic Music

Xiaoxue Gao; Chitralekha Gupta; Haizhou Li

arXiv:2207.07336·eess.AS·May 8, 2023

PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music

Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

PDF

Open Access

TL;DR

PoLyScriber is an end-to-end framework that jointly fine-tunes vocal extraction and lyrics transcription models, significantly improving lyrics transcription accuracy in polyphonic music.

Contribution

It introduces a novel integrated fine-tuning approach that optimizes both components simultaneously, addressing limitations of traditional two-step pipelines.

Findings

01

Substantial performance improvements over existing methods.

02

Effective joint optimization enhances lyrics transcription accuracy.

03

Demonstrated on publicly available datasets.

Abstract

Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, i.e. a singing vocal extraction front end, followed by a lyrics transcriber back end, where the front end and back end are trained separately. Such a two-step pipeline suffers from both imperfect vocal extraction and mismatch between front end and back end. In this work, we propose a novel end-to-end integrated fine-tuning framework, that we call PoLyScriber, to globally optimize the vocal extractor front end and lyrics transcriber back end for lyrics transcription in polyphonic music. The experimental results show that our proposed PoLyScriber achieves substantial improvements over the existing approaches on publicly available test datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsTest