MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation
Daeyong Kwon, SeungHeon Doh, Juhan Nam

TL;DR
MusT-RAG enhances large language models for music question answering by integrating a music-specific retrieval system, significantly improving domain adaptation and outperforming traditional fine-tuning methods.
Contribution
The paper introduces MusT-RAG, a novel retrieval-augmented framework with a specialized music database, improving LLMs' performance in music question answering tasks.
Findings
MusT-RAG outperforms fine-tuning in music domain adaptation.
MusWikiDB is more effective than Wikipedia for music retrieval.
Significant improvements on both in-domain and out-of-domain benchmarks.
Abstract
Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to LLMs by retrieving relevant context information when generating answers to questions. To optimize RAG for the music domain, we (1) propose MusWikiDB, a music-specialized vector database for the retrieval stage, and (2) utilizes context information during both inference and fine-tuning processes to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
