OpenMU: Your Swiss Army Knife for Music Understanding

Mengjie Zhao; Zhi Zhong; Zhuoyuan Mao; Shiqi Yang; Wei-Hsiang Liao,; Shusuke Takahashi; Hiromi Wakaki; Yuki Mitsufuji

arXiv:2410.15573·cs.SD·November 28, 2024

OpenMU: Your Swiss Army Knife for Music Understanding

Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao,, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji

PDF

Open Access 2 Repos 4 Reviews

TL;DR

OpenMU-Bench is a comprehensive benchmark suite designed to advance multimodal music understanding by addressing data scarcity, including lyrics and tool usage, and demonstrating the effectiveness of the OpenMU model.

Contribution

We introduce OpenMU-Bench, a large-scale benchmark for multimodal music understanding, and develop OpenMU, a model that outperforms baselines, both open-sourced for future research.

Findings

01

OpenMU outperforms baseline models like MU-Llama.

02

OpenMU-Bench broadens music understanding scope.

03

OpenMU and OpenMU-Bench are open-sourced.

Abstract

We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our music understanding model, OpenMU, with extensive ablations, demonstrating that OpenMU outperforms baseline models such as MU-Llama. Both OpenMU and OpenMU-Bench are open-sourced to facilitate future research in music understanding and to enhance creative music production efficiency.

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 5

Strengths

This paper has two strengths: 1. The authors make significant efforts in collecting and annotating music datasets using LLM. This contribution provides valuable resources for the music community, supporting further development of LM-based music understanding models. 2. The evaluation of the proposed OpenMU is thorough, with comparisons to baseline models such as Mu-Llama, MusiLingo, and M2UGen. The paper also includes ablation studies examining the effects of token size/length and various lo

Weaknesses

The paper contains two weaknesses: 1. **Limited Novelty**: The contributions of this paper are somewhat constrained. It should focus on either advancing the benchmark for music understanding or improving LM-based music understanding models. While the authors have invested considerable effort in data collection, task formulation, and initial evaluation of the benchmark, there are no new designs for model architecture or evaluation metrics except a new task termed "tool using". Furthermore, it sh

Reviewer 02Rating 3Confidence 4

Strengths

1. Standardizing the evaluation metrics for text-generation tasks in OpenMU-Bench is a smart step. It enhances the consistency and fairness of benchmarking, allowing for more effective comparisons between various music-understanding models. 2. The thorough exploration of key factors in training OpenMU is useful. For instance, examining how the number of music tokens impacts training efficiency and model convergence provides reusable insights for future research.

Weaknesses

1. The proposed model and dataset primarily rely on established techniques and common practices in the field, resulting in a lack of novelty: 1) Although the paper utilizes existing datasets and employs GPT-3.5 to generate new annotations, the underlying data sources are mainly based on pre-existing music-related datasets, with no introduction of new music data or innovative data-collection methods. 2) Regarding the model architecture, the use of AudioMAE for encoding music clips, Llama 3 as the

Reviewer 03Rating 5Confidence 4

Strengths

The paper's strengths lie in its novel contribution to the field of music information retrieval (MIR) through the creation of OpenMU-Bench, a large-scale benchmark that significantly expands the scope of music understanding tasks. The benchmark's comprehensiveness is a notable advantage, as it covers various aspects of music understanding, which is crucial for developing well-rounded multimodal language models. Additionally, the paper demonstrates OpenMU's superior performance over existing mode

Weaknesses

1. LLark has published its source code at https://github.com/spotify-research/llark, contrary to the paper's claim that it has not open-sourced its models and datasets. 2. OpenMU lacks innovation, being derived from previous works with limited novelty in training. 3. OpenMU-Bench lacks discussion on its construction, including data handling and annotation, limiting its practical application value. 4. The paper does not clarify the overlap between training and testing sets in OpenMU-Bench or e

Reviewer 04Rating 1Confidence 5

Strengths

This paper proposes a multimodal large language model, OpenMU, capable of comprehensive MIR tasks and outperforms the existing MU-LLAMA model. In addition, this paper establishes the publicly available dataset OpenMU-Bench, which is large-scale and comprehensive.

Weaknesses

The primary weakness of this paper is that its content and experimental results do not convincingly support its claimed contributions. Here are the specific issues: 1. Ambiguity in Contribution between Benchmark and Model: The relationship between the benchmark (including the dataset) and the model is unclear. In the abstract, the benchmark appears to be the main contribution, with the model serving to demonstrate the dataset’s capabilities and provide an example usage. However, in the introduc

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Diverse Musicological Studies · Music and Audio Processing