When Noise Lowers The Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models
Xiaosha Li, Chun Liu, Ziyu Wang

TL;DR
This paper reveals that in music large language models, the loss can decrease with corrupted music, challenging its use as a quality metric, and proposes a new evaluation method based on loss curve shape to better assess musical quality.
Contribution
It introduces a noise injection method to analyze model responses, demonstrating that loss curve shape reflects musical quality and proposing a profile-based evaluation framework.
Findings
Models respond more to local disruptions than global corruption.
Loss curve shape encodes information about musical quality.
Proposed evaluation is label-free and model-intrinsic.
Abstract
The rise of music large language models (LLMs) demands robust methods of evaluating output quality, especially in distinguishing high-quality compositions from "garbage music". Curiously, we observe that the standard cross-entropy loss -- a core training metric -- often decrease when models encounter systematically corrupted music, undermining its validity as a standalone quality indicator. To investigate this paradox, we introduce noise injection experiment, where controlled noise signal of varying lengths are injected into musical contexts. We hypothesize that a model's loss reacting positively to these perturbations, specifically a sharp increase ("Peak" area) for short injection, can serve as a proxy for its ability to discern musical integrity. Experiments with MusicGen models in the audio waveform domain confirm that Music LLMs respond more strongly to local, texture-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
