Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models

Yi Yang; Haowen Li; Tianxiang Li; Boyu Cao; Xiaohan Zhang; Liqun Chen; Qi Liu

arXiv:2511.08252·cs.SD·November 19, 2025

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models

Yi Yang, Haowen Li, Tianxiang Li, Boyu Cao, Xiaohan Zhang, Liqun Chen, Qi Liu

PDF

Open Access 1 Video

TL;DR

Melodia is a training-free music editing method that manipulates self-attention maps in diffusion models to accurately modify musical attributes while preserving the original temporal structure, outperforming existing techniques.

Contribution

This paper introduces Melodia, a novel approach that leverages attention probing to improve music editing by selectively manipulating self-attention maps without training.

Findings

01

Melodia effectively preserves the source music's structure during editing.

02

The method achieves superior adherence to textual descriptions and structural integrity.

03

Proposed metrics provide better evaluation of music editing quality.

Abstract

Text-to-music generation technology is progressing rapidly, creating new opportunities for musical composition and editing. However, existing music editing methods often fail to preserve the source music's temporal structure, including melody and rhythm, when altering particular attributes like instrument, genre, and mood. To address this challenge, this paper conducts an in-depth probing analysis on attention maps within AudioLDM 2, a diffusion-based model commonly used as the backbone for existing music editing methods. We reveal a key finding: cross-attention maps encompass details regarding distinct musical characteristics, and interventions on these maps frequently result in ineffective modifications. In contrast, self-attention maps are essential for preserving the temporal structure of the source music during its conversion into the target music. Building upon this understanding,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models· underline

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis