Pseudo-Label Transfer from Frame-Level to Note-Level in a   Teacher-Student Framework for Singing Transcription from Polyphonic Music

Sangeun Kum; Jongpil Lee; Keunhyoung Luke Kim; Taehyoung Kim; Juhan; Nam

arXiv:2203.13422·eess.AS·March 31, 2022

Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music

Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, Juhan, Nam

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel teacher-student framework that converts frame-level pseudo labels to note-level and employs self-training to improve singing transcription accuracy from polyphonic music, leveraging unlabeled data effectively.

Contribution

It proposes a new method for converting pseudo labels to note-level and enhances transcription performance through self-training in a teacher-student setup.

Findings

01

Effective use of unlabeled data for singing transcription.

02

Self-training improves model accuracy with noisy labels.

03

Unlabeled data can achieve comparable performance to labeled data.

Abstract

Lack of large-scale note-level labeled data is the major obstacle to singing transcription from polyphonic music. We address the issue by using pseudo labels from vocal pitch estimation models given unlabeled data. The proposed method first converts the frame-level pseudo labels to note-level through pitch and rhythm quantization steps. Then, it further improves the label quality through self-training in a teacher-student framework. To validate the method, we conduct various experiment settings by investigating two vocal pitch estimation models as pseudo-label generators, two setups of teacher-student frameworks, and the number of iterations in self-training. The results show that the proposed method can effectively leverage large-scale unlabeled audio data and self-training with the noisy student model helps to improve performance. Finally, we show that the model trained with only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

keums/icassp2022-vocal-transcription
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsDropout · RandAugment · Stochastic Depth · Noisy Student