Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation
Nghia Phan, Rong Jin, Gang Liu, Xiao Dong

TL;DR
This paper introduces a two-stage training pipeline for automatic chord recognition that leverages pre-trained models and unlabeled audio, significantly improving performance over traditional supervised methods.
Contribution
It proposes a novel pseudo-labeling and knowledge distillation approach that enhances chord recognition accuracy using unlabeled data and pre-trained models.
Findings
The BTC student model achieves over 99% of the teacher's performance using pseudo-labels.
The BTC student surpasses the supervised baseline by 2.5% after stage 2 training.
Both models show large gains on rare chord qualities.
Abstract
Automatic Chord Recognition (ACR) is constrained by the scarcity of aligned chord labels, as well-aligned annotations are costly to acquire. At the same time, open-weight pre-trained models are currently more accessible than their proprietary training data. In this work, we present a two-stage training pipeline that leverages pre-trained models together with unlabeled audio. The proposed method decouples training into two stages. In the first stage, we use a pre-trained BTC model as a teacher to generate pseudo-labels for over 1,000 hours of diverse unlabeled audio and train a student model solely on these pseudo-labels. In the second stage, the student is continually trained on ground-truth labels as they become available. To prevent catastrophic forgetting of the representations learned in the first stage, we apply selective knowledge distillation (KD) from the teacher as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
