SongTrans: An unified song transcription and alignment method for lyrics   and notes

Siwei Wu; Jinzheng He; Ruibin Yuan; Haojie Wei; Xipin Wei; Chenghua; Lin; Jin Xu; Junyang Lin

arXiv:2409.14619·cs.SD·October 11, 2024

SongTrans: An unified song transcription and alignment method for lyrics and notes

Siwei Wu, Jinzheng He, Ruibin Yuan, Haojie Wei, Xipin Wei, Chenghua, Lin, Jin Xu, Junyang Lin

PDF

Open Access

TL;DR

SongTrans is a unified model that simultaneously transcribes lyrics and notes from songs and aligns them, eliminating the need for pre-processing and improving efficiency in singing voice synthesis tasks.

Contribution

The paper introduces SongTrans, the first model capable of joint lyric and note transcription with alignment, trained on annotated data and optimized for real-world song diversity.

Findings

01

Achieves state-of-the-art results in lyric and note transcription.

02

First model to effectively align lyrics with notes.

03

Demonstrates versatility across different song types.

Abstract

The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and accompaniment separation). Besides, most of these tools are designed to address a single task and struggle with aligning lyrics and notes (i.e., identifying the corresponding notes of each word in lyrics). To address those challenges, we first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs. Our SongTrans model consists of two modules: (1) the \textbf{Autoregressive module} predicts the lyrics, along with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech Recognition and Synthesis