MuQ: Self-Supervised Music Representation Learning with Mel Residual   Vector Quantization

Haina Zhu; Yizhi Zhou; Hangting Chen; Jianwei Yu; Ziyang Ma; Rongzhi; Gu; Yi Luo; Wei Tan; Xie Chen

arXiv:2501.01108·cs.SD·January 6, 2025

MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

Haina Zhu, Yizhi Zhou, Hangting Chen, Jianwei Yu, Ziyang Ma, Rongzhi, Gu, Yi Luo, Wei Tan, Xie Chen

PDF

Open Access 1 Repo 3 Models

TL;DR

MuQ introduces a self-supervised music representation learning model utilizing Mel Residual Vector Quantization, outperforming previous models on various music understanding tasks with less data and demonstrating strong zero-shot capabilities.

Contribution

The paper proposes MuQ, a novel self-supervised music representation model using Mel-RVQ for improved stability and efficiency, and introduces MuQ-MuLan for state-of-the-art zero-shot music tagging.

Findings

01

MuQ outperforms previous models on multiple downstream tasks.

02

Scaling data and iterative training further improve performance.

03

MuQ-MuLan achieves state-of-the-art zero-shot music tagging results.

Abstract

Recent years have witnessed the success of foundation models pre-trained with self-supervised learning (SSL) in various music informatics understanding tasks, including music tagging, instrument classification, key detection, and more. In this paper, we propose a self-supervised music representation learning model for music understanding. Distinguished from previous studies adopting random projection or existing neural codec, the proposed model, named MuQ, is trained to predict tokens generated by Mel Residual Vector Quantization (Mel-RVQ). Our Mel-RVQ utilizes residual linear projection structure for Mel spectrum quantization to enhance the stability and efficiency of target extraction and lead to better performance. Experiments in a large variety of downstream tasks demonstrate that MuQ outperforms previous self-supervised music representation models with only 0.9K hours of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-ailab/muq
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing