MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

Yilun Zhao; Jia Guo

arXiv:2008.00781·eess.AS·February 2, 2021

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

Yilun Zhao, Jia Guo

PDF

TL;DR

MusiCoder introduces a self-supervised transformer-based music acoustic encoder that improves music annotation tasks by leveraging masked reconstruction pre-training on unlabeled data.

Contribution

It proposes a novel self-supervised learning method for music acoustic representation using transformers with new masking objectives, outperforming existing models.

Findings

01

Outperforms state-of-the-art in genre classification

02

Outperforms in auto-tagging tasks

03

Demonstrates effectiveness of self-supervised pre-training for music understanding

Abstract

Music annotation has always been one of the critical topics in the field of Music Information Retrieval (MIR). Traditional models use supervised learning for music annotation tasks. However, as supervised machine learning approaches increase in complexity, the increasing need for more annotated training data can often not be matched with available data. In this paper, a new self-supervised music acoustic representation learning approach named MusiCoder is proposed. Inspired by the success of BERT, MusiCoder builds upon the architecture of self-attention bidirectional transformers. Two pre-training objectives, including Contiguous Frames Masking (CFM) and Contiguous Channels Masking (CCM), are designed to adapt BERT-like masked reconstruction pre-training to continuous acoustic frame domain. The performance of MusiCoder is evaluated in two downstream music annotation tasks. The results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Weight Decay · WordPiece · Softmax · Dense Connections · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Residual Connection