MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models
Zhihao Ouyang, Ju-Chiang Wang, Daiyu Zhang, Bin Chen, Shangjie Li, Quan Lin

TL;DR
MQAD is a comprehensive large-scale music question-answering dataset derived from the Million Song Dataset, enabling improved music understanding and analysis through detailed features and advanced multimodal models.
Contribution
This paper introduces MQAD, a large-scale music QA dataset with detailed musical features, created using MIR models and LLMs, and demonstrates its effectiveness with a multimodal LLM.
Findings
Model trained on MQAD outperforms traditional music captioning methods.
MQAD covers diverse musical aspects including chords, structure, and genre.
The dataset facilitates exploration of music structure and understanding.
Abstract
Question-answering (QA) is a natural approach for humans to understand a piece of music audio. However, for machines, accessing a large-scale dataset covering diverse aspects of music is crucial, yet challenging, due to the scarcity of publicly available music data of this type. This paper introduces MQAD, a music QA dataset built on the Million Song Dataset (MSD), encompassing a rich array of musical features, including beat, chord, key, structure, instrument, and genre -- across 270,000 tracks, featuring nearly 3 million diverse questions and captions. MQAD distinguishes itself by offering detailed time-varying musical information such as chords and sections, enabling exploration into the inherent structure of music within a song. To compile MQAD, our methodology leverages specialized Music Information Retrieval (MIR) models to extract higher-level musical features and Large Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
