InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen,, Yuxuan Wang, Yanmin Qian, Xuchen Song

TL;DR
InstructME is a novel instruction-guided music editing framework using latent diffusion models that enhances musical harmony and coherence during editing and remixing tasks.
Contribution
It introduces a multi-scale U-Net, chord progression conditioning, and a chunk transformer to improve music editing quality and handle long-term dependencies.
Findings
Outperforms previous methods in music quality and harmony
Effective in instrument editing and remixing tasks
Supports multi-round editing with consistent results
Abstract
Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music's distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Diffusion
