Adversarial Learning for Improved Onsets and Frames Music Transcription
Jong Wook Kim, Juan Pablo Bello

TL;DR
This paper introduces an adversarial training scheme for music transcription that improves accuracy by modeling inter-label dependencies, outperforming existing state-of-the-art methods.
Contribution
It proposes a novel adversarial learning approach applied directly to time-frequency representations, enhancing transcription performance over traditional supervised models.
Findings
Significant reduction in error rates.
Improved frame-level and note-level metrics.
Enhanced confidence in model estimations.
Abstract
Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the time-frequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both frame-level and note-level metrics over Onsets and Frames, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Diverse Musicological Studies
